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ABSTRACT 

This discussion of time series data produced by random physi- 
cal processes emphasizes astrophysical data analysis. Several ran- 
dom process models phrased in the time domain are defined and dis- 
cussed. The moving average (MA) model represents the data as a 
sequence of pulses occurring randomly in time, with random ampli- 
tudes. The autoregressive (AR) model represents the correlations 
in the process in terms of a linear function of its past values 
and is closely related to the differential equation describing the 
dynamics of the system. A given stationary process always has both 
a MA and an AR representation, and one can easily be transformed 
into the other using the discrete Fourier transform. The moving 
average form is usually more suitable for interpretation, as the 
pulses and pulse amplitudes often have direct physical significance 
But the AR parameters are easier to determine from the time series 
data. Hence the procedure is to determine the best AR model from 
the sampled data, and then transform it to a MA for interpretation 
and comparison with theory. The technique for determining the AR 
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parameters is based on interpreting the AR model as a filter which, 
when applied to the data, yields the sequence of pulse amplitudes. 
The parameters are adjusted to maximize the randomness of the pulse 
amplitudes — that is, to make them as statistically independent as 
possible. (It is not enough to make the amplitudes uncorrelated, 
or white.) This maximization is implemented by specifying that the 
joint cumulative probability function of the pulse amplitudes be as 
close as possible to the product of the individual cumulative dis- 
tribution functions. A procedure for carrying this out is presented 
as a FORTRAN algorithm which has proven to be relatively stable 
numerically. Results of test cases are given to study the effects 
of adding noise and of different distributions for the pulse ampli- 
tudes. A preliminary analysis of the optical light curve of the 
quasar 3C 273 is given. 

I. INTRODUCTION: ASTRONOMICAL TIME SERIES 

This mostly self-contained introduction to time domain models 
of intrinsically random physical processes is directed toward 
astronomers and scientists in related fields, particularly those 
involved in the analysis and interpretation of data. The goals are 
to develop an intuitive understanding for this view of random pro- 
cesses and to give specific numerical techniques for the analysis 
of time series data. Many of the concepts presented here have been 
developed in other literatures, especially those of geophysics, 


economics, and speech analysis. Appropriate references will be 
given; although the terminology and basic philosophy will be some- 
what different, the reader is urged to consult these references. 

Of particular value are the following reviews, which parallel the 
present work in their viewpoint and emphasis on applications to 
data analysis: Wold (1964) (especially the two chapters by E. A. 

Robinson), Robinson (1962, 1967b), Box and Jenkins (1970), 

Kanasewich (1975), Claerbout (1976), and Granger and Newbold (1977). 
Reviews of stochastic processes in astronomy are given by Deeming 
(1970), Rothschild (1977), and Press (1978). A pioneering paper in 
the application of time domain models of random processes in astron- 
omy is Fahlman and Ulrych's (1975) analysis of the optical light 
curve of 3C 273 [see also Ulrych and Clayton (1976) and Ulrych and 
Bishop (1975)]. There are several books devoted to explicit com- 
puter codes for some of the operations discussed here (Simpson 
1966; Robinson 1967a; and Enochson and Otnes 1968). Texts are 
available on the following related topics: time series analysis 

(Hannan 1970; Anderson l 0 "”!), stochastic processes (Doob 1953; 

Parzen 1962; Bailey 1964; Papoulis 1965), prediction and optimiza- 
tion theory (Wiener 1949; Whittle 1963; Luenberger 1969), and 
probability theory (Feller 1957; Parzen 1960). There are also 
several interesting collections of related papers (Wax 1954; 
Rosenblatt 1963; Parzen 1967; and Krishnaiah 1969). The December 
1974 issue of the IEEE Transactions on Automatic Control was devoted 
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to system identification and time-series analysis [see the papers 
by Hannan (1975), Akaike(1975) , and Parzen (1975); see also Kailath 
(1974)]. For an extensive bibliography (roughly 10,000 entries) 
on time series and stochastic processes, complete through 1959, as 
well as an interesting "graphic introduction to stochastic processes" 
see Wold (1965) . 

Data from astronomy as well as from other physical and biologi- 
cal sciences often consist of a sequence of numbers, 

{^j, ^ 2 » ^ 3 » • • •» %), obtained by measurement of quantity I at a 
set of times, (tj, t 2 » £ 3 * • • •» • Such a sequence is a time 

series , and the data are time series data . The sample time series 
in Figure 1(a) illustrates a feature common in astronomical obser- 
vations, brought about by practical considerations such as observ- 
ing schedules, weather, equipment malfunction, etc.: the time 

points t- are not evenly spaced. (It is then said that the sampling 
is uneven.) Several ways of graphically indicating to what degree 
the sampling is uneven are demonstrated in parts (b) , (c), and (d) 
of the figure. Sometimes it is assumed that X is actually constant, 
and the repeated measurements are made to reduce- the uncertainty 
due to observational errors — such data are not really time series 
data, because the serial or sequential nature of the observations 
is irrelevant (i.e., the time-ordering contains no useful informa- 
tion). This paper deals only with the situation where X may undergo 
real variations with time, and the sequential nature of the 
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observations is crucial to the elucidation of the variations. The 


goal of the analysis — once the existen e of such variations has 
been established — is the extraction of information about the physi- 
cal process which gives rise to the variations. 

This goal is usually approached by identifying a pattern in 
the observed variations and then trying to uncover the cause or 
explanation of the pattern, often in terms of a physical model. For 
example, the pattern may consist of a definite functional dependence 
of X on t, such as a linear variation or a harmonic oscillation 
partially hidden behind noise. One then attempts to fit to the data 
a function (or model), the form of which is usually suggested by 
prior knowledge, physical understanding or guesswork. This fitting 
is usually carried out by minimizing, with respect to the model 
parameters, a measure of the difference between the model and the 
observations. This measure is usually defined as the sum of some 
positive-definite function of the point-by-point difference between 
the model and the data. The most common such measure is the sum- 
of-squares of the .Y-dif ferences, and the result is the ubiquitous 
least-squares procedure. 

But what if there is no consistent pattern to the data? It may 
be, for example, that the data come from a physical system that is 
random. In come cases the process is intrinsically random because 
of quantum mechanical effects — for example, a radioactive decay 
process. In ethers, one should perhaps say the process is 
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effectively random, because detailed knowledge of the initial con- 


ditions and of the governing physical laws might yield predictabil- 
ity (nonrandomness) for the system, but such knowledge may be 
virtually impossible or simply not practical. This situation is 
increasingly important in astrophysics, and examples could be cited 
from many areas, especially X-ray and radio astronomy. Is there 
any physical information to be extracted from such random data? 

The answer is yes, and the basic subject of this paper is the 
modeling of random processes to obtain concise and useful descrip- 
tions of the underlying physical processes. The discussion of the 
fundamental concept of random process in §11 is oriented toward 
astrophysical data analysis and description in the time domain. 

Just as with deterministic processes, there is an infinite variety 
of possible forms or models which can be used to describe random 
processes. Familiar examples are shot noise models (Terrell and 
Olsen 1970; Terrell 1972), random walks (Wax 1954), diffusion models 
(Wax 1954), Markov chains (Doob 1953), discrete branching processes, 
birth and death processes, competition and predation, queueing pro- 
cesses (Bailey 1964), and other special’ ~od techniques (e.g., 
Chandrasekhar and Munch 1951). In ill e descriptions of several 
types of models which are less familiar to astronomers, though 
ironically the models originated long ago in an astrophysical con- 
text (Yule 1927), namely the analysis of sunspot data. These models 
are emphasized here because of their direct physical interpretations 
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[e.g., in terms of randomly occurring pulses (§111)] and because of 
their very general applicability (§IV). A common feature of these 
models is their simple and explicit separation of the, nonrandom 
from the random parts of the process ; this feature is responsible 
for their usefulness, because such a separation usually has a clear 
physical basis — i.e., the random and nonrandom parts correspond to 
fundamentally different aspects of the process. Such a separation 
is assured only for stationary processes (defined in ilia). We 
shall almost always assume that we are dealing with physical pro- 
cesses that satisfy the stationarity condition. For practical 
reasons we shall always assume that the time sampling is discrete 
(see ilia) rather than continuous. All processes will be assumed 
ergodic — i.e., such that time averages (determined from one 
realization) are the same as statistical averages (determined from 
an ensemble of realizations). In addition, non-Gaussian processes 
will play an important role, because Gaussian processes cannot be 
unambiguously modeled in the way mentioned (see §IV). Model con- 
struction procedures are outlined in § IV ; computational details 
appear in §V, and examples of the computations are presented in 
§VI. The Appendix contains a description of the algorithm, together 
with FORTRAN code, for the deconvolution of time series using cumu- 
lative distribution functions. 
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II. MODELING RANDOM PROCESSES IN THE TIME DOMAIN 


This section begins with a brief account of the theory of 
random processes. Rather than a rigorous mathematical treatment, 
it is an informal heuristic discussion emphasizing a particular 
context — namely the interpretation of time series data produced by 
a physical process which is at least partly random. This situation 
is common in astrophysics as well as nearly all other quantitative 
acieuces. Interpretation often means the construction of a model 
of the physical process. This section will discuss several ways of 
mathematically modeling a random process in the time domain. Fre- 
quency domain techniques, such as power spectrum analysis, are most 
useful when harmonic variations are present but are less suited to 
random variations. Two goals of this paper are to demonstrate the 
richness and usefulness of time domain analysis, and to indicate 
the type of problem for which it is superior to frequency domain 
analysis. The text by Box and Jenkins (1970) provides a good over- 
view of this subject. The paper by Shinners (1974) is an interest- 
ing and practical discussion of the application of modeling tech- 
niques to human behavior. 

a) Time Seri.es and Random Processes 

Consider a physical variable X that can be measured as a func- 
tion of time t. In practice the values of t are not continuous but 
discrete because data recording equipment is capable of sampling 
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the observed quantity only at a finite number of times, separated 


by some minimum time interval. There is thus a finite series of 
values of t, {t^}, i = 1, 2, 3, . .V. The corresponding values 

of X form the set { X ^ = X(tj)}, i = 1, 2, 3, . . . , N. Often the 
values of t can be chosen to be evenly spaced, so that t ^ = tA£, 
where At is the constant interval between the times of observation. 

In any case the set of numbers { X is called a time series . 

Figure 2 shows an example of a discrete, evenly spaced time series. 
Despite the name, time series are not limited to functions of time, 
which here stands for any independent variable of interest. Other 
examples are: position in space (three-dimensional), position on 

the sky (two-dimensional), and wavelength (one-dimensional). Because 
the term time series is used in all cases, it should be kept in 
mind that t may stand for a variable other than time, possibly of 
multiple dimensionality. Sometimes the term sequential analysis 
is used in place of time series analysis to emphasize the key 
property that the numbers X £ are sequentially related to each other. 
The dependent variable X may also be of multiple dimensionality. 

A process is a rule or procedure that generates time series. 

That is, it is a prescription for determining the values of X for 
a given set of values of t and may or may not include a random 
element. Each such time series is called a realization of the 
process, and it is important to distinguish the process from a spe- 
cific realization. The process can be identified with the set of 
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FIG. 2. -This artificial time series consists of a sequence of 
decaying exponential pulses occurring randomly in time in the sense 
that the amplitude of the pulse starting at any given time is a 
random variable. The sequence of pulse amplitudes was obtained by 
raising a sequence of random variables uniformly distributed on 
(0,1) to the ninth power. The horizontal axis represents time, 
which is discrete and evenly spaced, although straight lines have 
been drawn through the data points to give the curve more of the 
appearance of a continuous function. The apparent trend of dimin- 
ishing amplitude with increasing time is spurious — the process 
generating these data is completely stationary. 
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all possible realizations of it. Figure 3 shows two more realiza- 
tions of the same process which generated the time series in 
Figure 2. 

The most interesting processes are those for which the rule 
generating the time series specifies probability distributions of 
the A'.*, rather than specific values that are the same at every 
realization. In this case we have a random process , which can be 
thought of as a set of random variables, f A',- } . For precise defini- 
tions and discussions of random variables the reader is referred tc> 
any text on probability or stochastic processes (e.g., Feller 1957; 
Parzen 1960, 1962). It is merely stated that a random variable, 

A’,-, can be specified by giving its probability distribution, Fy., 
defined such that 

Py. (x)dx = ri'{x <; A". < .r + dx) (1) 

in the usual limiting sense.* In many cases two random variables 


*Pi‘{* ) stands for the probability of event •. In these defi- 
nitions and elsewhere we shall use capital letters for the process 
(A) or random variable (A’.-), and lower case for specific values of 
the random variable (e.g., x ) . 


are related to each other, e.g., knowledge of the value of one may 
provide information about the other. There are two important 
definitions concerning the degree of such relatedness: two random 

variables, A' and Y, are said to be 
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INDEPENDENT (of each other) if their joint probability dis- 
tribution function equals the product of their individual 
probability distribution functions: 

P X y(x t y) ■ P x (x)P y (y), for all x and y 

and 


UNCORRELATED if the expected value of their product equals 
the product of their expected values: 

ay> = axy> 

The joint probability distribution P XY is defined by 


P X y(x>y)dxdy = Pr{x $ X g x + dx and y < Y < y + dy) . (2) 
The notation <• > is used for the expected value of :..e quantity •: 



P x (x)q(x)dx . 


(3) 


The more familiar definition of uncorrelation is for the case where 
the processes are assumed (or made) to be zero-mean, so that (XY) 
also vanishes. Note that independence is the stronger of the two 
properties; it is easy to show that independence implies uncorrela- 
tion, but not vice versa. This is a key fact, and later we shall 
deal with variables that are uncorrelated with each other, but are 
not independently distributed. There is a third property, inter- 
mediate between independence and uncorrelation: 
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X has the MARTINGALE DIFFERENCE PROPERTY (MDP) with respect 
to Y if the conditional expectation value of X (given the 
value of Y) is the same as the unconditional expectation 
value of X\ <Y|Y> * (X) . 


The name Martingal Difference Proper,./ t'Segall 1976), is based on 
the fact that this kind of process is to a martingale as an indepen- 
dently distributed process is to a process with independent incre- 
ments. (Martingales and processes with independent or uncorrelated 
increments are defined in continuous time and will be of no concern 
here.) It can be shown that if X and Y are independent, they each 
have the MDP with respect to the other; in turn, if X has the MDP 
with respect to Y, then X and Y are uncorrelated. 

Let us now be more precise with the definition of a process, 
which was already defined as a set of random variables. Take the 
set to be finite, with N members. The process is completely speci- 
fied by giving any one of the following functions: 

(1) The complete joint probability distribution function 


P 


' (^ ] , 3 • • • <-tX ^ 


= PrOj £ Tfj i Xj + cixj and x 2 < .Y 0 < x 2 + dx n 


and . . . and x N & X N & x N + dx,,} 


N 


(4) 


(2) The joint cumulative distribution function 

Py y (Xp x„ , .... x v ) = JVU' s x. and A' 0 i x„ 

i ‘ *' 4 


and . . . and X^ < x^) (5) 
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(3) The joint characteristic function 


Y 

A 1 * A 2* ' 


y 0 ^ 1 » ^ o * 

* X N 


• “,v> 



+ UjA', + . 



( 6 ) 


Equations (5) and (6) are straightforward generalizations of the 
individual cumulative distribution function 

F x (x) « PH A' < .r) (7) 


and the characteristic function 

4> x (u) * (exp (iuX) > (8) 

of a single random variable X. One can define what is called the 
moment-generating function by dropping the i in the definition of 
the characteristic function, but it does not always exist and is 
therefore of less theoretical importance. Nevertheless, it is of 
some practical use because of the concise way the nature of a 
variable can be expressed in terms of its moments. 

We shall now distinguish several degrees of randomness. It is 
convenient to define these categories in terms of predictability. 

A process is said to be deterministic if, based on past observations, 
the future of the process can be predicted exactly (i.e., with zero 
error). An example of such a process is one with no probabilistic 
element at all, such as the sinusoid Xj * sin(i + <J>); in this 
case all realizations are the same. However, there are purely 
deterministic processes for which each realization is different. 

The above sinusoid would be an example if the phase ^ere a random 
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variable, fixed during each realization but chosen randomly each 
time — each realization would be exactly predictable once the phase 
had been determined by observation. An example of a deterministic 
process from astronomy would be a perfectly regular variable star. 

A random process , on the other hand, is not perfectly predict- 
able. Even if the rule generating the time series is known com- 
pletely, it has a stochastic nature. Different realizations are 
therefore different and share only statistical properties (cf. 

Figs. 2 and 3). Discussions of the concept of prediction of time 
series can be found in texts by Whittle (1963), Robinson (1964b), 
Hannan (1970) , and Granger and Newbold (1977) . For the present 
purposes the important point is that while past observations may 
provide useful predictive information, for a random process there 
is nevertheless always some uncertainty or error in the predictions, 
even in the limit that the available data extend infinitely into 
the past. A case of particular importance is that in which past 
data provide no information about present or future values. (This 
must be made precise, because observations of the past provide some 
statistical information no matter how random the process: Because 

of stationarity , the mean value derived from past data is the best 
prediction for X n ) . In such cases there is no deterministic ele- 
ment, so the process can be cai led purely random . As with individ- 
ual random variables there are three degrees of lack of determinism 
which it is crucial to distinguish. 
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The first is independence . A process is independently distrib- 


uted (i.d.) if all of the random variables are independent of each 
other. Then the past provides no information about the present. 
There are four equivalent conditions which are necessary and suffi- 
cient that A'j , A,. . . . , X>,j are independent; i.e., that the pro- 
cess X is independently distributed (Parzen 1962) : 

(1) In t t rms of probability distributions : for all real 

numbers Xj , x ? . . . . , x^. 


A J | An , 


y j n 1 

, A 


“ i V V (X 2 ) 


"A' 

A; 


(9) 


(2) In terms of cumulative distribution functions : for all 

real numbers Xj, x ? , . . . , 

■ y y y (^* j * • • » A*, f ) 


A y ( A" 1 ) y (x^) • • • Ay (X. f) . 

Ai ''2 ,>,< "• 


( 10 ) 


(3) In terms of characteristic functions : for all real num- 

bers ti . . u . . • • « nr 

1 . i’i 


V y y ( ‘< 1 t ^ o f • • • • <^1/) 

A j , A n , . ... A^ i *- A 


ty (MjHv. ( U ,) . . • <* (u ) 

A 1 •'? ‘ r M •' 


(4) In terms of expectations : for all functions 

L J 1 * n * • - • * % 

^1 (Aj )<? o ( A ^ ) . . . A, ( )^ 


(ID 


(121 
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provided all of the expectations indicated in this equation exist. 


1 


I 


These relationships must hold for M - 2, 3, . . ., N. If, in addi- 
tion, the X- all have the same individual distributions, then X is 
said to be identically and independently distributed (i.i.d.). 
Independence is the strongest form of lack of relation and absence 
of predictability. The term purely random will be reserved for 
independently distributed processes. 

A second and weaker description of a process is that it is 
uncorrelated . For a process with zero mean value, this means that 
the autocorrelation function vanishes for all except zero lag; 
that is, 

p(W 3 (x n x m ) = (13) 

( 6 n m is the Kronecker delta, which vanishes if n t m, and is unity 
for n = a 2 - CO . ) Since (Vm> is zero if X n and X m are 
independent of each other and can be nonzero otherwise, the auto- 
correlation function contains some information about dependence. 

Its vanishing implies a degree of lack of mutual dependence — but, 
as we shall see, not total ence. 

We will dv.al almost exclusively with stationary processes. 

Most discussions of stationary random processes assume that the 
mean value of all processes is zero — because if it is not, the 
constant mean can be subtracted. If 

X n = X n ~ W • (U) 


V 


19 


the new process X ’ has zero mean. However, this will not be done 
because there are cases where the positiva definite nature of a 
signal is crucial (e.g., the examples in Figs. 2 and 3). This 
matter will be discussed further in SVIf. 

Figure 4 shows examples of four types of orocesses: determinis- 

tic, random, uncorrelated, and independently distributed. Note 
particularly the process depicted in part (c), which is uncorrelated 
but not independently distributed. (This , ;ss will be examined 
in detail in SIVb.) Another example of an uncorrelated but depen- 
dent process can be constructed as lcllows: !' *■ be any zero- 

mean random variable. Define X 2 * 82 ^ 1 , where s 2 is randomly +1 or 
-1 with equal probability (p *■ 1/2). In general let X n * B n X^, 
where the s n are defined similarly to s 2 , but are independent of 
each other and of s 2 . It is easy to show that < X n X m > * 0 for m + n, 
because ? 2 (X n ,X m ) is an even function of at least one of its argu- 
ments. ">!>t the X n are most definitely not independent, as 
\X n \ ■ |Xj| for all n > 1. On the other hand, it is straightforward 
to show that if a process is Independently distributed, then it is 
u .correlated. Most data arise from a process which has a random 
aspect to it but is neither uncorrelated nor independently distrib- 
uted; such is called a partially random process. In general a 
process can contain both deterministic and random components. Indeed, 
it can be shown that any stationary process^ contains only these 

^A stationary process is one whose statistical properties do 
not depend on time. Strict stationarity means that a] l of the joint 
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rifi. 4. -Time scries produced by four different types of pro- 
cesses (left) and the corresponding autocorrelations (right). The 
dashed line is the theoretical autocorrelation, and the solid line 
is the estimate from the realization shown. The processes are: 

(a) a sine wave, (b) a moving average, (c) a moving average with the 
uncorrelated pulse shape shown in Figure 17, and (d) independently 
distributed noise with a highly nonnormal distribution. [The auto- 
correlation of the sine wave in part (a) is damped because a finite 
realization was used to compute it.] 
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probability distributions are Invariant to a translation of time. 


There are other kinds of statlonarltv that are less restrictive, 
but we will not need to distinguish between them. 

two components, anil the separation between them can be written in a 
surprisingly simple and explicit form. This separation, called the 
Wo 1 d d e c ompp s 1 1 lo n . will be discussed in detail in §IV. 

It may seem strange, especially to the reader unfamiliar with 
the econometric approach to time series analysis (Wold l‘>t>4), that 
so much emphasis is put on prediction. But the relationship between 
prediction and statistical description is clear: a good prediction 

of the values of a process depends on good knowledge of its statis- 
tical properties. it will be seen that the concept of prediction 
must be extended to include the use of future data (i.e., estimation 
of A,, based on A,. + 1 , A',. + „ , . . .) as well as past data. That is, 
one pretends that A',, is unknown and tries to estimate or predict its 
value based on knowledge of the neighboring values A' ( , Y (i( ..... 

This approach leads to the concept of a two-sided (acausal) prediction- 
error filter, which forms the basis of the techniijuc to be described 
in HIV for the extraction of information from time scries data. 

The ability to know when two random processes, say A and V, arc 
really the same is important. This does not mean that specific 
real i /.at ions of the processes are equal point -by-point (i.e., 

\ r “ Y r for all /:) because even different realizations of the same 


random process are not equal point-bv-point . What Is meant is that 
the probabilistic rules for A' and Y are the same. Specifically, 
the joint probability functions listed in §1 must be identical. 


b) Whit: Noise; Independently Distributed Noise 


Of special importance is the class of random processes R which 
satisfy all three of the following conditions: 

(1) (R ) = 0 (zero mean value), 

(2) <K,, 2 > = o'- < ® (finite variance), and 

(3) = 0 for m j >; (uncorrelated). 

Such a process is called white noise . Nothing is said in this 
definition about the probability distribution of R. There are many 
different kinds of white noise, according to the probability dis- 
tribution. Gaussian, or normally distributed noise is very common, 
because of the fact expressed in the Central Limit Theorem. t It is 


f The sum of independent random variables with any distribution 
tends to be normally distributed as the number of variables 
increases (Claerbout 1976, 54.5). 

also not necessarily true that the R f , be independently distributed, 
i.e. , that a be statistically independent of R r , for n t r;. White 
noise may be independen t ly distribu ted nois e or just uncorrel a ted 
noise. Both are "white" because the power spectrum of an uncorre- 
lated process (and therefore of any independently distributed 
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process) is constant with frequency. Figure 5 and Figures 4(c) 
and 4(d) are examples of white noise with various distributions. 

Note further that only the second moment of R has been specified. 

The third and higher moments E(R f ,R rr; R^), etc., are not determined, 
although they are not completely arbitrary either, as they must 
conform to conditions (1) through (3) above. 

c) The blovi-Hij Average (blA) Model 

A model of a random process is an explicit mathematical descrip- 
tion which is usually an attempt to describe a physical process in 
simple terms. It often involves a relatively small number of param- 
eters, the values of which are to be determined by some procedure 
using the observed time series data (i.e., one or more realizations 
of the process). An extremely useful model is the moving average ^ 


^Unfortunately this term is also sometimes used for the proce- 
dure of smoothing data with a running mean, formally similar to the 
summation involved in the MA. 


(MA) . An MA is a process in the form (% R . , where o' is a 

k K n ~ K 

white noise process and the ('• are constants. The array of con- 
stants C = {i\. } is called a f ilter or linear system . The reason for 
this terminology is that the above expression describes the output 
of an electrical filter into which is put a random sequence R of 
impulses (noise). That is, regarded as a function of discrete 




time k describes the shape of a pulse that would result from an 
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impulsive or delta-function input; is the impulse response of 
the filter. This is easily seen by letting R be set equal to a 
delta function at >: = (i.e., R n = n <) , which then yields 

X n = — that is the pulse { c" • } with its origin, i = 0, shifted 

to time n' . It is easily seen that if there are several or many 
non-zero values of R r , each one produces a pulse at time K, of 
amplitude R The net result is a sequence of overlapping pulses. 
The interpretation of the MA as filtered noise is illustrated in 
Figure 6. The time series in Figures 2 and 3 are also MA's. The 
closely related shot noise process will be discussed below, in §IIh. 

In most discussions c. the MA the restriction is made that 
C n = 0 for r. < 0. This condition is called causality , and such a 
filter is said to be causal because a nonzero value at a negative 
time would correspond to a response of the filter at a time prior 
to the input. (The point >i = 0 will be called the origin of time 
for the pulse.) In some contexts this acausality would be unphysi- 
cal, and it is convenient to restrict filters to respond only at 
and after the input, i.e., the filter can possess a memory but not 
premonition. However, for a number of reasons it is frequently 
useful or even necessary to relax this restriction. One reason is 
that it is often convenient to identify the origin of time for a 
pulse with a point near the peak rather tiian with the time of the 
cause of the pulse. For time series in which the independent 
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FIG. 6. The moving average (MA) process depicted in terms of noise passed through a filter. 
The nois° process shown is positive only, and the filter is roughly exponential in shape. 



variable is not time the concept of causality is obviously of 
limited value. There is no Arrow of Space, or Arrow of Wavelength, 
as there is an Arrow of Time. Other reasons for dispensing with 
causality will be mentioned as they arise below. For the present, 
it should be simply noted that a filter is a set of numbers 
where n may take on negative as well as positive values. In prac- 
tical computations, of course, n takes on a finite number of values, 
say — q , — ^ d - 1 , ~~c\ + 2, . • • , —2 , —1 , 0, 1, 2, . • * , p — 1 , p . 

The case q = 0 is the conventional one-sided or causal pulse and 
corresponds to a MA process of order p, abbreviated MA(p) . The 
general case will be called a two-sided MA of order p, q, or MA(p,p). 

An interpretation of the MA of interest in the economic appli- 
cations (Wold 1964) is that the pulses represent the reaction or 
response of some system to news or information which arrives in 
discrete impulses. The effect of the news persists for some time 
(memory) but eventually dies out. This suggests a condition that 
the C n get smaller as ti gets large. In addition, it is convenient 
to allow the mean value of the input process R to be nonzero. For 
example, in some cases the pulse amplitudes must be positive because 
of their physical significance, as when the pulses are outbursts of 
radiation. If the mean value of the input is positive and the 
pulse shape has a positive "area" or total strength, the mean of 
the output is also positive, since <A'> = (R*C) = (R) ( 2 <.’,,) . 

A' ^ 
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The above statements are summarized in the following 


DEFINITION : A MOVING AVERAGE (MA) is a process A' which 

can be written in the form: 


C.S . (A = C*E) , 

; u- ; 


where E is an uncorrelated white noise process, possibly 
with nonzero mean: 

<(E U - E)(E„ - E)> - o 2 S fS ' fr [E = (16) 

and the are constants satisfying ^ ^ (called 

stability of the filter i’) . If the E- are zero for all 
negative (positive) values of ; this is a causal 
(acausal) moving average. If neither is true, it is 
called a two-sided MA. An MA is said to be of order 
Qv?) if the range of ?’ for which i\- is nonzero is from 


The stability condition assures that the pulse dies out at infinity, 
and is written in the form given because is the total energy 

output of an electrical filter if the input E represents the ampli- 
tude of the electric field at the input of the filter. The range 
of i may be finite or infinite. A finite MA is obviously stable. 

It is important to note that E is random and i’, if considered 
as a time series itself, is deterministic. That is, the process A 



has its random and its predictable aspects explicitly separated in 
the MA representation. Since R represents the new information 
arriving at the input of the system, it is called the innovat ion . 

We will be particularly interested in the class of MA's in which R 
is independently distributed, but it should be remembered that the 
definition requires only that R be uncorrelated. Sometimes the 
terms "MA process" and "MA model" are used nearly interchangeably, 
but this is a loose usage. An MA process exactly satisfies the 
definition given above. An MA model is a representation or model 
which can be used to attempt a description of any process, whether 
or not it is actually an MA. For example, one can use a low-order 
MA model to approximate a process which is a higher-order (or 
infinite) MA or not an MA at all. The pulse shape { t' - } is also 
assumed to be constant (independent of time, »). This will be seen 
below (in §IVa) to be less restrictive than it seems at first. A 
final point concerns normalization. If the switch C -*• aC, R -*■ a~*R 
is made, then .Y obviously remains unchanged. Hence, in comparing 
different moving averages, it is convenient to remove this ambiguit 
by specifying in some sense the "size" of either R or C. Several 
possible choices are: 


(1) 

r o " 1 


(2) 

a . 2 = 
n 

i-H 

It 

(3) 

E 

i 

= i 

(A) 

E <\- 

i 

= i 

(5) 

5 k '| 

= i 

(6) 

max (’. 
i 7 

= i 



(7) 

max | C • | = 

1. 
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For causal filters the conventional choice is (1). However, for 
acausal filters this choice would render the size of C dependent on 
the location of the time origin, which is to some extent arbitrary. 

(We will see another reason why this choice is poor in §IVe.) The 
other six choices make the size of C invariant to a shift of the 
origin of time. The best choice of normalization seems to depend 
on the particular context. 

To summarize: the moving average represents the deterministic 

part of a process with a constant filter, C, and the random part 
with an uncorrelated noise process, R. The process is the convolu- 
tion of C with R, and can be viewed as a random sequence of pulses. 

d) The Autoregressive (AR) Model 

The MA model expresses the correlations in a process A' in terms 
of memory — in the sense that the filter C remembers, for a while 
at least, the previous inputs R : . There is another way of express- 
ing such memory — by saying that the process remembers its own 
behavior at previous times, that is, A' remembers, or can be partially 
represented in terms o f -Vj, a h _ 2 , .... If it is assumed that 
this representation involves a linear relationship, the memory an 
be represented by an expression of the form Fj A' ;; _ j + 0 ^ + . . . . 

This suggests writing 
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where R n is a random noise process just as before, and the S's are 
constant coefficients. The first term on the right-hand side of 
this equation represents the immediate response of the system to the 
random input, while the others are the memory. The conventional 

notation is to write 4, ■ -B, , so that equation (17) becomes (with 

K K 

A a - 1) 

00 

s « ■ Z v„-k <18 > 

Jt-0 

or R = 4*.Y. If this sum is finite, say from c 1 to y , the process is 
called a (one-sided) AR process of order j', or AR(r) . Note the 
symmetry of this relation with that for the MA (eq. (15)), namely, 

X = C*R. Tlie AR is the inverse of the MA in the sense that the 
filters C and A are convolutional inverses of each other. By analogy 
with the acausal or two-sided MA, the sum in the last equation may 
be extended to negative k; this gives the two-sided AR 



4, X . 
k n-K 


(19) 


The concept of a process's memory of its own future may seem unusual, 
but we are dealing with post-real-time data analysis or with cases 
in which the independent variable is not time, so that causality is 
not relevant. Also, this extension is necessary for consistency 
with the two-sided MA in equation (15). The name autoregressive 
arises because the expression just above equation (17) is in the 
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form of a regression of X n on itself evaluated at different times, 
so that equation (17) is a self- or auto-regression. 

A schematic electric circuit representation of the AR process 
is shown in Figure 7. This circuit assumes a causal model, because 
there is no physical circuit that can generate future values. The 
discussion of normalization given above for MA's applies as well to 
AR models. Convent ionally A Q is set equal to 1; this will be done 
for some examples (such as the one to follow) but not generally. 


DEFINITION: AN AUTOREGRESSIVE (AR) PROCESS is one which 

can be written 


A o X n m E n A i X n-i ’ 

i + O 


( 20 ) 


or R * A*X, where R is an uncorrelated white noise process 

(as in the definition of the MA) and the As are constants 

satisfying £ ^. 2 < 00 (stability of A). The autoregres- 
L 1 

sive filter A is purely causal, purely acausal, or two- 
sided depending on whether A ^ is nonzero for only i J 0 , 
for only i 5 0, or for both i £ 0 and i £ 0. An AR is of 
order (p,p) if the range of i is from -q to p. 


An example of a second-order AR process is shown in Figure 8 . 
Note that it has a sinusoidal appearance (and would probably be 
called "quasi-periodic") even though it has no harmonic component 


1 
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AR PROCESS 



GAUSSIAN NOISE 



SPECTRUM OF AR PROCESS 



FIG. 8. -A realization of the second-order AR process 
X n - R n + 0.8.X w _j 0.75X n _2 (top). The middle curve is the 

realization of the Gaussian noise which drove the AR process. S 
X is purely nondeterministic the spectrum (bottom) is continuous 
but it has a narrow peak corresponding to the quasi-sinusoidal 
appearance of the process. 


nor any deterministic component. Figure 9 gives further examples 
of AR processes with quasi-harmonic appearance. 

Actual physical landom processes can often be well represented 
by an AR model with a small number of parameters A ■ . Equation (20) 
is a difference equation which is the discrete version of the dif- 
ferential equation which describes the dynamics of the system (i.e., 
the equation of motion). Thus, the AR parameters can be interpreted 
as the coefficients of the linear differential equation of the 
system. The moving average pulse is the impulse response of this 
differential equation. 

In fact AR models can generally be rewritten in the form of 
moving averages. As an example, consider the simplest nontrivial 
AR process, namelv the one-parameter process defined by: 


This corresponds 
of the left-hand 
gives an explicit 


to the AR filter 
side of equation 
solution in the 


(1, -a). Recursive substitution 
(21) into the right-hand side 
form of an infinite MA: 


■Vi 111 ' ?. • 


k=o 


n-K 


( 22 ) 


Thus an input impulse et time n*, of amplitude h )i*> gives rise to 
the output pulse ... 0, 0, 1. n, cr, or\ . . . (multiplied by 
a /; *) . For M < 1 this is an exponentially decaying pulse: 

0 n c n* 

(ii-ii*)Zii a . 

e ii N ii* . 


(23) 
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-175 - 0.85 


INPUT PROCESS: GAUSSIAN WHITE NOISE 



fit:. d . — A ser It's ot AK processes of tho form 
•V: = fyi + + ,l where I*' is independent Gauss ian noise 

(o’). and the values of ,,’j and , are shown at t lie rijtht. The pro- 
cesses were chosen to exhibit various spectral peaks, (v • none lias 
a deterministic harmonic component . The middle column ...,ows the 
sample (top) and theoretical (bottom) ant ocorrelat ions for each 
process . 
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Note that we have converted this one-parameter AR process into an 


infinite but stable MA ( C n -*■ 0 fast enough that the sum 


£ 

n = 0 


n 2 


converges). If j ot j >1 the pulse given above is not stable, and 
further C n -* « exponentially as n -*■ To avoid this difficulty, 
let n -*■ n + 1 and rewrite equation (21) as 

- “''V, ■ <«> 


Recursive substitution with this equation leads to 



-k 

a 


R . 

n+k 


(25) 


The effect of a single impulse at time n* is thus a growing expo- 
nential pulse of amplitude -ot -1 ? n * and growth constant a, terminat- 
ing at time n* - 1 (see Fig. 10). Thus, equation (21) has a stable 
solution for any ot, unless |a| = 1; in one case the pulse extends 
forward in time (i.e., is purely causal) and in the other it extends 
backwards (is purely acausal) . 


e) The Relationship Between the AR and MA Models 

In the example given in the previous section a simple AR model 
was converted into an MA. This is a general feature: any AR model 
can be converted into an MA and vice versa . In the standard treat- 
ments of this subject special restrictions must be placed on the 
models for this to be true, and some otherwise well-behaved AR 
models, for example, are not convertible into (stable) MA's. But 
with the generalization to two-sided representations, convertibil- 
ity holds without restriction. The fundamental reason for this is 



J 







evident from the example in equations (21) through (25): |a| >1 

led to a causal MA representation that diverged, and the restric- 
tion |a| < 1 is usually imposed. But if two-sided representations 
are allowed, this restriction is unnecessary because there is a 
convergent acausal representation. The MA corresponding to an 
arbitrary AR process is usually two-sided. Unfortunately the direct 
approach of recursive substitution of the AR representation into 
itself is extremely awkward in the general case, because at each 
step there are choices to be made concerning the form of the sub- 
stitution which have a complex dependence on the specific values of 
the AR parameters. However, the demonstration of how AR and MA 
models can be converted into each other, including the computation 
of the coefficients, is rendered simple by the introduction of 
Z-transforms, as will be shown in §IIIf. 

/) Autoregressive-Moving Average (ARMA) Models 

An obvious generalization is to allow the current value of the 
output, X n , to depend explicitly on (i.e., to remember) values of 
both the output X and the input R at other times: 

X n = B k X n-k + ^ C k R n-k ’ (26) 

or A*X - C*R , where A has the same relationship to the B ^ as before. 
This is called a mixed autoregressive-moving average model , or an 
ARMA model. If the processes involved are finite and causal [e.g., 
AR(p) and MA(p)] the mixed process is denoted ARMA ( p,q ). 


(Generalization of this notation to the two-sided case is cumber- 
some and is not necessary here.) Physically one can think of an 
ARMA process as representing a system, described by the AR param- 
eters A, which is driven by an input which is itself a moving aver- 
age process, rather than white noise. But as was indicated in the 
previous section, the distinction between system response as 
described by MA and AR models is merely a matter of interpretation. 
Hence there is no rigid distinction between what portion of a 
process is AR and what part is MA. In fact the AR part of an ARMA 
can be converted to an MA, yielding a pure MA. Similarly, an ARMA 
can also be converted to a pure AR. Furthermore, one could convert 
only part of the ARMA to MA (or AR) , sc that there is a great range 
of possible ARMA combinations to represent a given process. 

It may be asked "What is the use of mixed representations at 
all, since they can all be converted to pure AR or MA?" The answer 
lies in a concept called parsimony of representation. The point is 
that some processes may be representable as an infinite-order AR or 
MA, but as a finite ARMA. The latter would then be a more compact 
or parsimonious representation. Parsimony can be of great impor- 
tance in computing, where one is often searching for models involv- 
ing the smallest number of parameters. But it should be stressed 
that parsimony is not necessarily of significance in the interpre- 
tation of the results of modeling. A good example is that given at 
the end of §IId, which has the most parsimonious representation as 
AR(1), but might well be most simply interpreted as MA(°°). 
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There are several discussions of the form of the autocorrela- 
tion functions and power spectra of low-order AR, MA, and ARMA 
processes which should be consulted by the reader interested in 
such functions (Box and Jenkins 1970; Stralkowski, Wu and DeVor 
1970, 1974). 


g) AR Integrated MA (ARIMA) Models and Nonstationarities 


The discussion so far has assumed that the process under dis- 
cussion is stationary. This is an important restriction, for non- 
stationary processes do not have representations of the kind dis- 
cussed up to this point. But a very special kind of nonstationarity 
can be incorporated in a simple modification of the AR, MA, or 
ARMA models. The general form is 


A*(V U X) = C*R , 

where 7 represents the difference operator: 

= x„ - a ;. 


n ■ n 


'n- 1 


(27) 


(28) 




and V stands for the Jth difference operator, equivalent to oper- 
ating with V d times. If we let W = (so that W is an ARMA 
process) A' can be obtained by integrating U d times. That is, 

X = 7, where S is the summation operator: 

n 


s(x„) . r i.f„ - x { . 


(29) 


1--0O 


Thus X is said to be an autoregressive-integrated-moving average, 
or ARIMA, process. 
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Consider the simple case d m 1. While X is not stationary, 
its first difference is (Box and Jenkins 1970). The nonstationarity 
which this gives to X has the character of a floating mean value — 
the mean of the process is not constant with time but drifts. 
Similarly, a second-order ( d * 2) ARIMA process is such that both 
the mean value and average slope wander as time goes on. 

Finally, it is interesting to add a further generality in the 
form of a constant term in the equation: 

A*(V d X) = C*E + D q . (30) 

It can be seen that the meaning of the constant term D Q is to allow 
the process X to have a deterministic trend in the form of a poly- 
nomial of order d. 

The ARMA and ARIMA representations can be quite useful in some, 
specific applications. The current discussion will center on the 
less complex AR and MA models for simplicity and because they seem 
to be sufficiently general for most astrophysical applications. 

The reader should consult Box and Jenkins (1970) for more details 
on ARMA and ARIMA models. 

h) The Shot Noise Model 

As already mentioned, the MA is closely related to the shot 
noise mode l, which is usually defined in continuous time as follows 

X(t) = £ C(t - t.) , (31) 

i 
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where C(t) is a given function of time (a continuous pulse shape) 
and the t^ are random points in time which are Poisson distributed. 
This process can be viewed as the output of a continuous linear 
system, with impulse response C(t), resulting from an input con- 
sisting of a Poisson sequence of constant amplitude impulses 

Hit) = £ S(t - . (32) 

i 

The Poisson distribution results from randomly and independently 
placing the time points t^. The probability of having k impulses 
in an interval At is 

-AAt . ...k 

P k (At) = , (33) 

where A is a constant giving the mean rate of occurrence of the 
impulses, which here all have the same amplitude. If At is identi- 
fied with the time interval in discrete time (see §IIa) then equa- 
tion (33) gives the probability distribution of pulse amplitudes, 
where k is to be identified with the amplitude. (The amplitudes are 
quantized in unit steps.) If time is sliced finely enough so that 
AAt << 1, then we have 

I I - A At k = 0 (no pulse) \ 

A At k = 1 (one unit amplitude pulse)l ; (34) 

0 k = 2 (multiple pulses) I 

that is, most of the time a pulse does not occur, but occasionally 

a single pulse occurs, always with the same amplitude. It can be 
seen that the noise processes f, with large values of n, shown in 


kl 
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Figure 5 have approximately these properties (except that they are 
zero-mean processes and the amplitudes of the pulses are not always 
the same). Thus, an MA with pulse shape given by the discrete 
version of C(t) and with the quantized probability distribution of 
the input R given by equation (33) (or in the limit A At -*■ 0 by 
equation (34)), with k -*■ R, is the discrete version of the shot 
noise model. 

Some useful relations for the moving average, easily derived 
from the defining equations, are: 



and 

o/- <U- or»2) - 0 (36) 

These are somewhat different in form from the relations for the 
usual definition of the shot noise process. For example, if 

- 0 in a moving average, pulses of uniform amplitude are occur- 
ring at every time, and A’ is constant (o v 2 = 0); this is not true 

A 

for a Poisson distributed shot noise process where the variance of 
the amplitudes of the pulses is often taken to be zero. A related 
difference is that the concept mean pulse rate loses significance 
for an MA because it is automatically 1 per unit A t. That is, 
pulses occur at every point of (discrete) time. The incidence of 
zero amplitude pulses is expressed in the distribution function of 
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the innovation [as in eq. (34)] and is absorbed into the mean 
pulse amplitude. 

For a good discussion of the shot noise model see Papoulis 
(1965). Terrell (Terrell and Olsen 1970, 1972; Terrell 1972) has 
applied this model, with exponential pulse shapes, to several 
astrophysical problems. 

III. THE STRUCTURE OF PULSES 

The separation of a process into a random part and a purely 
deterministic part, as exhibited in the moving average, is often 

t- 

of direct physical significance. The pulse may represent the 
unfolding of some process for which there is a physical theory. 

Knowledge of the pulse shape^ may provide interesting numbers such 


^ The terms pulse shape, pulse, (moving average) filter, wave- 
let, impulse response, moving average representation , and moving 
average parameters are all used in the literature to convey approxi- 
mately the same meaning, and are interchangeable in many contexts. 
Here the term impulse will be reserved for a pulse, usually taken 
as the input to a filter, which is a delta function in time. 


as pulse width, rise and decay times, etc. Th< innovation, or 
random noise process /?, represents the pulse amplitudes and con- 
tains information about pulse rates and the distribution of pulse 
amplitudes. To develop a feeling for the structure of pulses, 
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this section discusses the representation of physical pulse shapes 
as filters, the algebra of filters, and a concept called the phase 
character (or sometimes delay character ) of filters. These subjects 
are discussed extensively in various mathematical works (Robinson 
1964a, 1967a, b; Treitel and Robinson 1966; Box and Jenkins 1970; 
Anderson 1971), which should be consulted for more details. The 
discussion here will be oriented toward the analysis and interpre- 
tation of astrophysical time series data and will emphasize two-sided 
filters, which have been neglected in much of the standard 
literature. 


a) The Discrete Representation of Pulse Shapes 

Suppose that a physical pulse is described by a continuous 
fu: ction of time, C(t). An example would be the light curve pro- 
duced by a nova or supernova. Let the values of C be specified 
(or "sampled") at evenly spaced points in time, say t n m n At, for 
some set of values of n; it is presumed that the points are close 
enough that the interesting structure in the pulse is resolved. 

Then the set of numbers or filter elements, {f^l (C(t n )}, is a 
discrete representation of the pulse shape C(t). 

One-sided pulses.- In many situations there is a moment before 
which C is identically zero. The classical example is the pulse 
which comes out of an electrical filter in response to an impulse 
at ti;ae t 0 ; in accordance with causality this output must be 
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1 


4 

i 




r 


exactly zero at all previous times t < t Q . By identifying the 
origin of discrete time, n * 0, with this moment, the filter ele- 
ments need only be given explicitly for nonnegative indices, 
rc ■ 0, 1, 2, . . , . Such a filter is said to be c ausal or one-sided . 

oo 

sum 2 Cyi can sometimes be ass- :iated with a physical quantity, 
n m o 

such as the total energy in an electrical pu.-;; if so 


£ ^ < - 


(37) 


must hold for any physical filter. This condition is called 
stability or convergence . In some cases, other stability condi- 

OO 

tions such as T! 1 C 1 < °° arc relevant (Robinson 196°. 51.1). A 
n-0 

filter which is both stable and causal is said to be physically 
realizable . We shall now see that some perfectly useful physical 
pulses are not causal. 

Tuo-sided pulses.- Consider the following scenario: a small 

signal grows with time, slowly at first, then more rapidly; reaching 
a peak, the signal begins to decay and eventually disappears. For 
example, take the specific form 

( e a ^ t < 0 (exponential growth) 


C(t) 



t > 0 (exponential decay) 


(38) 


or in discrete time: 


an 

e 



e 


n = . . . , -3, -2, -1, 0 
/: * 0, 1, 2, 3 


(39) 


The 
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In this case it is not convenient to take the origin of time at the 
beginning of the pulse, which strictly speaking lies at n ■ 

[Of course, it would always be possible to take the origin at some 
early time before which C(t ) is effectively zero, say to within the 
measurement accuracy. In the same sense almost all pulses can be 
taken to be of finite length.] A more important reason for con- 
sidering noncausal filters is that, among causal filters, only the 
members of a very special class (called minimum delay , a term to be 
defined below) have stable, causal convolutional inverses. Since 
our methods for determining pulse shapes from time series data 
depend on first determining the inverse pulse shape, restriction to 
causal filters would imply the unnecessarily limiting restriction 
to minimul delay filters. 

In many cases when a filter is written explicitly as an array 
of filter elements, such as (. . ., C_ 2 , C_^, C Q , Cj , C 2 , . .), 

the location of the origin of time is ob;ious (Cq in this example). 
But in some cases it is not obvious from the indexing or from the 
context, and a boldface symbol will be used to locate the origin 
[e.g., (1, -a) denotes C Q = 1, Cj = -a]. Figure 11 illustrates 
the basic difference between one- and two-sided pulses. 

b) Z- Trans forms 

We now introduce a powerful tool for the analysis of pulses, 
the Z-transform . It is a tremendous time saver in the manipulation 
of filters as well as in the proofs of certain relationships between 
filters. Consider a pulse or filter C = {C n ) , 
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ONE-SIDED 




Schematic representation of the difference between (a) causal pulses and (b) acausal 








n »•«</, -ii + 1, . . . , -2, -1, 0, 1, 2, . . . , /' - 1, p, containing 
p + q + 1 elements. The 2-transform of i' is defined as the follow- 
ing function of the dummy complex variable a: 

r 

C(z) = £ C,,z ,! . (40) 

n-—q 

This is simply a polynomial or power series in positive and nega- 
tive powers of a. In the case p or >/ = we assume that the 
series converges on the complex plane within some annulus including 
the unit circle. 

The coefficients determine the filter (and vice versa); that 
is, i'(s) determines the and vice versa. The transform will some- 
times be denoted with the operator " thus: i’(a) = .'(C). The 

inverse transform will be denoted 2 -1 , and can be thought of ns the 
operation of identifying the coefficients in a series expansion of 
C(z). The 2-transform has the following alternative interpretations 

(1) A representation of the time behavior of pulses in which 
a represents the unit delay operator (and a -1 represents the unit 
advance operator) . 

(2) A discrete analog of the Laplace transform: if f(t ) is 

replaced by 52 ~ ' N )i where t } . = n M, then the Laplace 

K 

transform of f becomes the 2-transform (a = e ‘ , where s is the 
Laplace transform variable). 

(3) Similarly a version of the discrete Fourier transform (OFT) 
with a = e‘ a '. 


(41) 


(4) A generating function for the filter C. 

The Z-transform maps from the time domain to a transform domain. 

The operations of shifting in time are denoted with the unit delay 
operator , D, and the unit advance operator A: 

= *„-l i - x „-j ; 

" x n+l • * Vj • 

In the transform domain iP corresponds to multiplication by z ^ and 

7 7 * 

A corresponds to division by z . The definitions, theorems, and 
proofs involved in the use of the Z-transform closely parallel 
those for integral transformations (such as the Laplace and Fourier 
transforms) of continuous functions. The Z-transform will be 
demonstrated in applications in the rest of this paper. Further 
details can be found in various sources (e.g., Jury 1964; Gold and 
Rader 1969; Oppenheim and Schafer 1975; Rabiner and Gold 1975). 


a) Convolution 


Consider the effect of putting a signal R into a filter C and 
connecting the output (say Y) into a second filter D. That is, 

C and D are placed in series (see Fig. 12). By definition: 


Y n ~ £ C k R k , 
k 


(42) 


so 


X„ = 


= E 


D k Y n-k 


= E h E c 0 r. 


I n-k-l 


"Lh Ec 

k 


k "'m-k^n-m 
m 


= T B R 
m i 


m 


m n-m 


( 43 ) 
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where 


m 


D, C , 
k m-k 


( 44 ) 


which is easily shown to be the same as 

B m - Z C k P^ k . W) 

Thus, the action of two filters in succession (series) can be com- 
pletely represented by a single filter, called the convolution of 
the two, written as 

B = C*D . (46) 

It is readily verified that the "-transform of the convolution of 
two filters is the product of their Z-transforms : 

B(z) = C(3)T(a) . (47) 

This is the most important reason for the utility of the n-transform. 
Furthermore, convolution is commutative and associative: 

A*B = B*A , (48) 

= (/W)* <’ . (49) 

It should be noted that the output of the MA is formally the 
convolution between the input noise process and the pulse shape, 
although the physical interpretation is somewhat different in this 
case (convolution of a process with a filter instead of two filters 
with each other) . 


V 
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d) Factorisation 


As will be demonstrated shortly, any finite filter with more 
than two nonzero elements can be broken down into the convolution 
of a number of shorter filters. In particular, a filter of length 
n + 1 can be written as the convolution of n filters of length 2. 

Such filters have two and only two successive elements nonzero and 
are called couplets or dipoles : C n+1 ). Since many of the 

important properties of pulses are invariant to a shift in time, 
it is convenient to take n = 0, and denote the dipole as (C 0 , Cj). 

This is acceptable if all pulses are shifted so that their first 

nonzero element is at n - 0 (i.e., causality), but to allow factor- V- 

ization of two-sided filters acausal dipoles of the form (f_j, £q) 
must also be introduced. Figure 13 depicts causal and acausal 
dipoles, and shows how convolutions generate longer filters. 

Now consider the filter {C ?J }, n = -q , . . . , p, where q and p 
are nonnegative integers. [This is not the most general case, as 
the index set might contain only positive terms (e.g., . . .,0, 

0, , c’ 3 , 0, 0, . . .), but such cases can be handled with the 

same methods.] The function 

F(s) = s q C(s) (50) 

(where C(s) is the '’-transform of (f }) is a polynomial of degree 
p + p, with nonnegative powers of s only. Hence by the fundamental 
theorem of algebra it can be written 


z o 


F(s) = c? 




P-fc?/ 

"A 1 
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CAUSAL DIPOLE 


ACAUSAL DIPOLE 







where the s ^ are the complex zeros of P(s). With a little algebra 
it can be shown from this expression that 


C{z) 


P*7 - , 

C .o n 

f t-p+1 




(52) 


With the definition 


a. = 

t 


(-s r /) -1 i = 1, 2, . . . . p 


-2 


£=p+l,p+2, . . p + q , 


(53) 


the inverse Z-transform of this equation gives 


C - A'[(l,a 1 )*(l,a 2 )* . . . *(l,a p )]*[(a p+1 1)* . . . *(^,1)1 , 

(54) 

where A' is the quantity in square brackets in equation (52). The 
first p dipole factors are causal and the last q are acausal. Since 
the ordering of the has not yet been specified, there are many 
possible distinct factorizations of this form, depending on which 
z ^ are assigned to the causal factors and which to the acausal 
factors. As will be shown in the next two sections, among the many 
choices possible for the origin of time in the original filter and 
for the assignment of the zj ' , there is a single choice which has 
the property that each causal (acausal) dipole has a convergent 
causal (acausal) inverse. It is obtained simply by making \a^\ < 1 
for all k, which can be achieved unless |s ■ 1 for some k. This 
can be considered as the unique factorization of the original fil- 
ter C, although it really represents merely the simplest of many 
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possible factorizations. If the original filter is causal, then 
q = 0 and the above analysis shows that there is only one factor- 
ization into causal dipoles — and this is the "unique" factoriza- 
tion which is usually discussed. 

e) Delay (or Phase) Character 

In electrical engineering the frequency response of a filter 
describes the degree to which an AC signal at a given frequency 
will be attenuated on passing through the filter. Another effect 
of a filter is to cause frequency-dependent phase shifts of signals. 
For the present applications, rather than view these effects in the 
frequency domain, it is more convenient to use the time domain. 

Consider first a causal dipole (Cq, Cj) as in the previous 
section. This filter is defined to be minimum delay (or minimum 
phase ) if |Cj| < j C Q | ; it is maximum delay (or maximum phase ) if 
|(7j| > | Cq | • These names are derived from the way in which energy 
is delayed at the output of the filter, as will be detailed below. 
Since delay properties are not affected by an overall shift in 
time, an acausal dipole (C_j, C Q ) is minimum delay if | C Q | < |f_j| 
and maximum delay with the opposite inequality. The case 
|Cq| * |C +J | is somewhat singular in that the inverse does not 
converge (see below); hence this case must be handled separately. 

Nov consider a filter C = { t7 7 * } of arbitrary length, say n + 1. 
Again because of time-shift invariance only the causal case need 
be considered. That is, if the filter is not causal. Its causal 
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equivalent should be used. The causal equivalent of a filter is 
simply the filter shifted so as to bring its first nonzero element 
(which may not exist if the filter is infinit to i * 0. From the 
previous section we know that there is a unique factorization into 
n causal dipoles. Each dipole is either minimum delay or maximum 
delay. If all the dipole factors are the former, the entire pulse 
is said to be a minimum delay pulse ; if the factors are all maximum 
delay, so is the entire pulse. If there are some of each, we have 
a mixed delay pulse . Thus, the delay character of the pulse is 
specified by the delay character of the dipole factors of its causal 
equivalent. The physical meaning of these concepts is as follows. 
Introduce the quantity 



K ~ 


this is the integrated energy — the energy which has come out of 
the filter up to and including time i — due to a delta function 
input at time 0 [for electromagnetic signals energy * (amplitude) 2 ]. 
This function rises from zero for i < 0 (since by assumption 

= 0 for i < 0) , monotonically, to its final maximum at i = n + 1, 

no 

and thereafter remains constant at a value P m m P^ +1 ■ C^ 2 , 

which corresponds to the total energy output of the filter. Corre- 
sponding to filter C there is a family of filters (all of length 
n -I- 1) which is generated by reversing all possible subsets of the 
dipole factors of C. The reverse of (Cq, Cj) is ((?*, Cq) , where * 
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represents complex conjugation of the possibly complex filter ele- 
ments. Correspondingly, the reverse of any filter is obtained by 
reflection about the origin of time and by complex conjugation of 
all of the filter elements. The (time) reverse of any array 
X ■ {.Y n ) wiil be denoted X ■ {.Y*^}. Since there are n such factors, 
this family has 2 n members, including the original filter itself, 
although they are not all necessarily distinct. It will be evident 
from the discussion in § 1 1 Ig that the power spectra and autocorre- 
lations of the members of the family are all identical. In fact, 
the family may be defined as the set of pulses of length n + 1 with 
the same autocorrelation and spectrum as C. Further, the total 
energy of all these filters is the same, so the partial energy 
curves of those filters all begin and end at the same points (see 
Fig. 14). Between these points the curves are quite different and 
even cross each other. But it can be shown that there is one curve 
which everywhere lies above all the others — and it corresponds to 
the single minimum-delay member of the family of pulses. That is, 
the energy output of the minimum delay filter is delayed as little 
as possible, among all possible filters with the same spectrum, in 
that at each moment of time the integrated energy is maximum. 
Similarly the unique maximum delay pulse has a partial energy output 
which lies below all the other curves and corresponds to delaying 
the energy as much as possible. 

Minimum delay pulses begin suddenly and decline slowly. In 
fact the minimum delay pulse rises as sharply and declines as 
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FIG. 14. -The concepts of minimum and maximum delay, (a) A 
short autocorrelation function. (b> The set of eight pulses which 
share this autocorrelation. (c) A plot of the eight corresponding 
partial energy curves: the uppermost curve corresponds to the mini 

mum delay pulse [dashed line, topmost part of (b) ] and the lowest 
curve corresponds to the maximum delay pulse [solid line, topmost 
part of (b)]. 


gradually as possible, consistent with the given autocorrelation. 
The maximum delay pulse is the time reverse of the minimum delay 
and has the reverse of these properties. Further discussions of 
the physical and mathematical meaning of minimum delay are in the 
geophysical literature (Robinson 1962, 1963, 1964a, 1966, 1967b; 
Smylie, Clarke, and Ulrych 1973; Berkhout 1973; Schoenberger 1974). 

f) Inverse Filters 

The filter which assumes the role of unity for convolution is 
the delta function, 

6 " {5 n,0 } ■<•••• °» °» 0, 0, . . .) , (56) 

since convolution with it leaves any filter unchanged. Then given 
any filter C we can ask whether there is an inverse, C _1 , such that 
■ 6. The answer is obtained by Z-transforming this equation: 

C(z)C~'*(z) = 1 , (57) 

so 



where Z -1 denotes the inverse Z-transform. Hence finding the 
inverse of C is reduced to finding the coefficients in the series 
expansion of the reciprocal of the Z-transform of C. Such expan- 
sions always involve choices as to whether to use positive or nega- 
tive powers of z. The choice is made on the basis that the result- 
ing inverse filter should converge, as will now be explained. 
Consider the dipole factorization given in § T 1 1 d . It is easily 
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seen that the inverse of the filter is the convolution of the 
inverses of its dipole factors, so the problem is reduced to find- 
ing the inveise of a dipole. Consider fix. causal dipoles which, 
except for a constant factor, can be written ( 1 , -a). The 
Z-transform is (1 - as). Which expansion of (1 - as) -1 converges^ 

^Convergence at z = 1 is implied, because we are really inter- 
ested in the convergence of the coefficients of z n in the expansion 
of the Z-transform. This allows use of the DFT, because 
\s\ = |exp(-£u))| = 1 on the unit circle. 


depends on the magnitude of a: 

; 1 + as + ( as ) 2 + (as ) 3 + . . . 
[-[(as) -1 + (as) -2 + (as) -3 + . 


(1 - os'* 


-l _ 


if | a j < 1 
. .] if la I > 1 


(59) 


Thus the Z-transform of the inverse of a minimum (maximum) delay 
causal dipole must be expanded in positive (negative) powers of s 
if the result is to converge. If C = ( 1 , -a) 

1 ( 1 , a, a 2 , a 3 , . . .) |a| < 1 

. (60) 

(. . ., -a 3 , -a 2 , -a, 0 , 0, 0, . . .) |a| > 1 

(Cf. Fig. 10 and the associated discussion in §lle.) Similarly, a 
maximum (minimum) delay acausal dipole gives a convergent expansion 
in negative (positive) powers of s. It is easy to prove (e.g., with 
Z-transforms) that a minimum delay causal dipole has the special 
simplifying property that its inverse is also minimum delay and 


63 


causal. The same holds for the convolution of arbitrarily many 
such dipoles. Similarly the inverse of a maximum delay aeausal 
pulse is maximum delay and aeausal (see Table l). Because of this, 
it is convenient to arrange the factorization so that all factors 
are in one of these forms. This can always be accomplished as 
follows: suppose !' of the zeros of .'(c) sat isfy |s | N l and the 
remaining k ,' zeros satisfy | z ‘ | 1 (assume all 1^/ | ^ 1). Then 

shift the time origin of so that in the notation of §11 Id p = P 
and = i,’. Then assign the /' zeros which lie outside the unit 
circle in the complex plane to the p causal dipoles in the factor- 
ization (eq. (52)) — these will be minimum delay. The i,’ zeros 
inside the unit circle are assigned to the q aeausal dipoles, which 
are then maximum delay. This factor izat ion represents the filter 
as the convolution of two factors: 


= (l,a, )*(!„:.)* . 


*( 1 ,>:„) 

i 


(p factors, minimum delay, causal) 


(hi) 


= (*:. 4 . 1 ,1)*Cj. 4 . : ,1)* . . . *(if^..l) 


(;,’ factors, maximum delay, aeausal) (62) 

so that r = x (;•’*(;) and = (K ~ ' ) * *o~ ‘ ) , where K is as defined 

above. Note that anil |f -1 have t be same delay and causality 

properties as do F and J, respectively. It can be shown that the 
Laurent series thus generated for (’”*(;;) converges within an 
annulus in the complex plane which includes the unit circle, and it 
is the coefficients of the various powers of ;; in this series 
wl licit give the elements of i’ - '. 
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TABLE 1 

PROPERTIES OF DIPOLES AND THEIR INVERSES 






In many of the standard treatments of this subject only causal 
filters are allowed. It then results that a filter has a convergent 
inverse if and only if the zeros of its Z-transfi lie outside 

the unit circle — otherwise the forward expansic '"ges and the 

acausal backward expansion is not permitted. I.. w r words, only 
minimum delay (causal) pulses have (causal) inverses, and then the 
inverse is also minimum delay. This problem was apparently first 
discussed by Wold (1938b). Two-sided filters always have a con- 
vergent inverse (unless a zero lies exactly on the unit circle). 

In practice, a very convenient way to evaluate inverses is to 
replace the Z-transforms in equation (58) with the discrete Fourier 
transform (DFT) . A code for this procedure is contained in the 
Appendix. Specifically, given a set of filter elements {f/}, one 
evaluates the DFT of takes tiie reciprocal term-by-term, and then 
obtains the inverse DFT. This procedure automatically provides the 
correct convergent expansion of a two-sided filter — without explicit 
evaluation of the zeros of the Z-transform of t he pulse! For 
example, consider the pulse C = (1, -a). The DFT procedure yields 
the inverse (1, u, w , a-*, . . .). If |aj < 1 this is obviously the 
correct inverse, interpreted as a causal pulse. Many terms may be 
necessary to get a good representation of the pulse shape, espe- 
cially if | it | is close to 1. If |a| > 1, the above inverse, 
interpreted as a causa 1 pulse, is divergent (or "unstable"). The 
trick is to note that for any finite number of terms. 
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d, a, a 2 , ...» a n ), there will be one largest term, a n . The 
inverse should then be renormalized to make this element unity: 
(a _M > • • •» <z“ 2 , a -1 , 1), and then interpreted as an expansion 
backward in time, ( a ~ n , .... a ~ 2 , a -1 , 1). This is the correct 
(acausal) inverse if |a| > 1. The same procedure works in the 
general case, in which the inverse pulse extends both forward and 
backward in time.** In general some zeros must be appended to the 


**In this case the time origin does not appear at a fixed 
place in the inverse and must be identified by some other means. 

This inability to pinpoint the origin of time in the calculated 
inverse is the price paid for not having to determine the zeros of 
C(z). Specifically, if we knew how many zeros lie inside and out- 
side of the unit circle, we could then locate the origin. Fre- 
quently, but not always, the origin is located at the peak of the 
inverse pulse. 

original pulse before applying the DFT inverse because the inverse 
is almost always longer than the filter itself. For two-sided 
pulses this is also needed to ensure that the backward and forward 
tails o r 2 inverse pulse do not overlap, due to the wraparound 
feature of the DFT. (Envision the arrays pasted on the surface of a 
cylinder, with the righthand and lefthand ends abutting. Any set 
of entries on the right end can be transferred to the left end 
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without affecting the DFT. This is illustrated in Figure 15.) 
Examples of inverses calculated in this way are shown in Figure 16. 

While the inverse as defined here is unique, there are other 
inverses which can be defined. Noting, for example, that the exact 
inverse of most filters will be infinitely long, one can ask: What 

finite filter, of fixed length, is closest to being an inverse to 
C in the sense that the sum of the residuals from the delta 
function, 



2 


(C*C ~ l ) . -6 . 

z z 


is minimum? The solution to this problem is the truncated approxi - 
mate (least squares) inverse of C, and is discussed extensively by 
Robinson (1964a, 1967; see also Treitel and Robinson 1966). One 
could just as well ask for the truncated inverse which minimizes 
the absolute value residuals [see Claerbout and Muir (1973) for an 
interesting discussion of some of the properties of this inverse]. 
Inverses may also be evaluated by various techniques which involve 
determination of the zeros of the Z-transform of the filter (see, 
e.g., Steiglitz 1974), but this approach is computationally quite 
laborious compared to the DFT method. 


<j) Correlation Funotiona and ioiser Cyootra 

The autocorrelation function of a process V is defined as 

O x (n t n) - ((Aj, - X) (.Y„ - A)) , (63) 


V- 
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FIG. 16. -A sample zoo of pulse shapes (left) and the corre- 
sponding inverses (right) as determined with the discrete Fourier 
transform. 
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FIG. 16. -Concluded. 



where X = (X n ). Section Ila outlined its significance. The power 
spectrum is the Fourier transform of the autocorrelation and also 
is equal to the squared magnitude cf the Fourier transform of the 
time series itself. We shall give, without proof, expressions 
which are readily derived from the definitions. 

For a moving average X « R*-R, where R is assumed stationary 
and with spectrum P R (u) = 1, we have 

P x (n,m) - p (n - m) « a R 2 p c (n - m) - X 2 , (64) 

where a R 2 = (R n 2 > is the variance of the innovation and is the 
autocorrelation of the pulse, defined by 

P C ( " - S C k C k+n-m ' (65) 

\ 

It can be seen that the autocorrelation is the convolution of the 
pulse with its reverse. For zero-mean processes (e.g., with 0) 

the autocorrelation of the MA is proportional to the autocorrelation 
of the pulse shape. Similarly, for this case the spectrum of the 
process is equal to the spectrum of the pulse shape: 

5 a ,( u )) = | C ( u ))| 2 , ( 66 ) 


where f(<jj) is the Fourier transform of the pulse: 


PU) 


£ C, e 


t K u) 


(67) 


and the normalization of R is such that P^(w) “ 1. In terms of 
^-transforms we have 

P Y ( u») = c'*(^“ 1 )^(^) (s = e tW ) . (68) 
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For an AR process, R * A*X , it is easy to 3how that 


S X M 


M(w) 


( 69 ) 


where A (to) is the Fourier transform of the AR filter: 


A(u) - ^ A. e 


iku 


(70) 


Finally, for an ARMA process, • R*C , 

2 
|2 


P (w) „ l££“lL . 

A U(u>)[ 


(71) 


It is readily verified from these formulas (or directly from the 
definitions) that both the spectrum and autocorrelation of X are 
unchanged by time reversal of C, a result alluded to in §IIle. 


IV. MODEL CONSTRUCTION 


The tools are now at hand to construct stochastic models from 
time series data. In outline the procedure is: (1) obtain data 

from one or more realizations of the process of interest; (2) decide 
on the form of the model to be fit to these data; (3) use the data 
to generate estimates of the model parameters; and (4) if necessary, 
transform the resulting model to a form more easily interpreted 
physically. (The last step recognizes that the form most suited 
to computations may not be the most suitable for comparison with 
physical models. Typically a low-order AR model is easiest to com- 
pute, and the corresponding MA has the simplest physical interpre- 
tation. See §Vf.) The stage will be set by presenting an 
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existence theorem which justifies the concern in § 1 1 and IITI for 


the MA and AR models, by asserting that any stationary process can 
be represented with these models. Then explicit methods for the 
estimation of the parameters in these models will be developed. We 
assume that all processes of interest are stationary. 

oJ .-h; Ft: intense Thrown: the Wold Pooonvopitinn 

Moving average models were introduced in §11 as a rather arbi- 
trary way of representing "memory" or correlations. The question 
arises as to what processes can be represented in this seemingly 
very special form. The surprising answer, first demonstrated in 
1938 by the econometrician Herman Wold (1938a), is that any station- 
ary process can be so represented. The simple explicit form, known 
as the Wold Decomposition , is given in the following theorem. 

THE WOLD DECOMPOSITION THEOREM: Given any stationary 

process, A', there exist: 

(1) a purely deterministic process P„ 

(2) an uncorrelated zero-mean noise process Ft, and 

(3) a moving average filter C, 
such that A' m H*C + P. 

This is a decomposition of A into a deterministic part („ n ) and a 
random part (.*YV). The random part may contain correlations and 
can in turn be deconvolved into a moving average, in which the 
correlations are represented by the deterministic filter C and the 
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purely random part is contained in the white noise process R. If 
the MA is restricted to be causal, this decomposition/deconvolution 
is unique (except for a constant factor which can be exchanged 
between R and C) . It is not unique without the causality condition, 
because there are other noncausal MA representations. This non- 
uniqueness is the subject of the following subsection. If, in 
addition, X has an absolutely continuous spectral distribution 
function (i.e., X is itsel f .iot deterministic) , then C is minimum 
delay, and therefore has a convergent, causal, minimum delay 
inverse A. This fact assures the existence of a unique autoregres- 
sive representation of the detrended process .V - P, in the form 
A*(X - D) m R, where A m C -1 . Thus the Wold theorem establishes 
that any stationary process, with its deterministic part (including 
the mean value) removed, can be represented as an MA, AR, or a 
mixed ARMA process (see § Ilf) . 

For a thorough discussion and proof of this theorem see Hannan 
(1970, p. 137) or Robinson (1964b, p. 126). The following informal 
proof conveys the spirit of these rigorous works. Consider a given 
stationary process X, which for simplicity will be taken to have 
zero mean. The forward predictor of order p is defined as 

*» p) - £ V*-* • <72) 

k* l 

for any set of numbers B^, k = 1, 2, . . . , This linear expres- 
sion is designed to forecast the value of X n , based on the previous 
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values A' ?! _ j , X n _ 2 , 


, A The quality of the prediction of 


course depends on the values of the P - . Those values which give 
the best predictions form the optimum predictor of order p. Mere 
specifically, the optimum least-squares predictor of order r is 
defined as that which minimizes the mean square prediction error. 


■w - £[’, - ry ■ 


(73) 


with respect to the parameters in F. The optimum predictor is the 
limit as r *■ 00 . A very important process is that defined by 
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the error made by the optimum predictor at time This random 
process is to be identified with the white noise process R in the 
definitions of AR, MA, and AliMA process-. ■ (§11) and is ea- the 
innovation of the process .V tKailath 1968; darzen 1969). The error 
at time n is due to the new pulse starting at that time, because 
the effects of pulses starting at previous times are completely 
incorporated into the optimum prediction. That <o,,> = 0 follows 
immediately from the vanishing of <.V r ) and the definition of R. It 
can be shown (Wold, 1938a) that 

(v ,;r ) - o for all ;• • o . (75) 

\ K-K >V 

Intuitively this is so because R r is the error made at time u by a 
predictor optimized on all prior data (i.e., V,,_ ; * V ( _„, . . .), 
so there can be no correlation of with these data, otherwise the 
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correlations could be used to improve the already optimum predictor. 
It follows that = 0 for all m f n; for, taking m > n without 

loss of generality, 

M ■ - V) ' Vn - £ B k Wn-k " 0 <76 > 

K 

because all of the terms are of the form in equation (75). This 
makes the R a kind of orthogonal set, and the process X can be 
expanded in the series 

cr 

C.R + D , (77) 

n k n-k n 

k=i 

where D is a residual process, orthogonal to i?. By t’ne usual 
technique of multiplying this equation by R m and taking expectation 
values, the expansion coefficients can be found: 

c k - (Vn-*) ' (78) 

(This formula is an alternate way of computing the MA parameters 
and has some advantages over the direct inversion C = A *.) The 
final step, that D is deterministic, is a consequence of the van- 
ishing of the prediction error for D. The details of this proof 
can tw found in the above references. Caines and Sethi (1979) give 
an interesting discussion of causality and the Wold theorem. 
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b) A Less Restrictive Existence Theorem 


The moving average filter of the Wold representation is 

(1) convergent (or stable ) : C L 2 < 00 , 

k K 

(2) causal: = 0 for k < 0, 

(3) minimum delay (see §IIIe), 

and 

(4) constant (c?^ independent of time). 

Extending Robinson's (1962) terminology, we call any filter with 
these properties a minimum delay wavelet . It is indeed a curious 
feature of the Wold theorem that an arbitrary stationary process 
can be represented in such a special form. What about an MA 
process with a pulse that does not have these properties? The Wold 
decomposition exactly represents such a process with an MA model 
which DOES have these pr pertles . For example, it represents a 
mixed-delay MA in terms of minimum delay wavelets. It would seem 
that such representations are misrepresentative. Some processes 
seem to have better representations than the one provided by the 
Wold theorem. But how can this be? The ar r lies in the fact 
that, while too restrictive with the pulse C, the Wold decomposi- 
tion is too liberal with regard to the innovation. It would be 
preferable, at least for physical processes consisting of indepen- 
dent pc' es, to restrict the renovation to be independently dis- 
tributed — not just uncorrelated — and to allow the pulse to be 
mixed-delay and acausal, rather than assuming causality. 
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( 1 nc idental 1 y . there ' ■ presumably a similar extension in which 
constancy of tho pul so is dispensed witl'., for ono ran oonstruot a 
stationary MA with noncons* ant pulsos. Stability cannot bo dis- 
pensed with because it corresponds to finitonoss of obsoivable 
quant it ios. ) 

A key point is that a given stat ionarv process can bo repre- 
sented by any member of a large family of MA models. The members 
of the family share a common autocorrolut ion and power spectrum but 
have different dolay/causal itv properties; the correspond ing inno- 
vat ions have different degrees of randomness ranging from uncorre- 
lated to independently distributed. Tho Wold t lioorom singles out 
the unique minimum delay wavelet representation because only causal 
filters are permitted. Tho existence theorem for the more general 
representations is as follows (Scargle l l >77): 


TUT T.XTKNDKP m-A'OMl’OS li'lON TtiKORKM : Wivon any stationary 

process A, there exist: 

(l) a purelv deterministic process 
(d) a family of uiu-o rrel at ed , aero-mean noise 
processes, ’ }, and 

CO family of two-sided moving average filters. 



such that A = \ Tho filter t'ami ly is t ho set 

of all filters which have the same aut ocoi re 1 at ion fund ion 
as V; one of them is minimum delay, and one maximum delay, 
and the rest are mixed delay. 

7 " 


* 




The proof is simple. Since A' is stationary, the Wold theorem 
applies and assures the existence of a unique, causal, moving- 
average representation, 

x - d = i r*-r , (79) 

where is a minimum delay wavelet. It was shown above (§IIId; 
see also Robinson 1964b; and Smylie, Clarke, and Ulrych 1973) that 
there is a family of filters which share a given autocorrelation 
and which can be obtained from each other by all possible combina- 
tions of time-reversal of the dipole factors. We define the family 
{C^} as the set of all filters which have the same autocorrela- 
tion as d*. If (d is finite, of length /V + 1 , chen there are 2^ 

(not necessarily distinct) members of this set. One is minimum 
delay (d itself), one is maximum delay (the reverse of d' ) , and the 

rest are mixed delay. For each ( define and 

R (i) = _ p). Then 

= c (i Ka <4 ’)*(X - P) = A' - ? , (80) 

establishing the desired representation. A direct calculation of 

( i ) 

the autocorrelation of P shows that it is the same as that of 
A’ u , namely a‘ > (S ?zr; , and this completes the proof. The uniqueness of 
this family is also readily demonstrated. Note that the represen- 
tations in this theorem are not just similar, they are exactly 
equivalent . They differ only m the way in which the random and 
deterministic, parts are assigned to the innovation and to the pulse. 
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It is possible that a theorem stating that one and only one 
of the R is always independently distributed can be proved. 

I do not know whether this theorem is true. There are theorems 
dealing with the existence of nonlinear representations with inde- 
pendently distributed innovations (Rosenblatt 1971) or innovations 
with the martingale difference property (Segall 1976). Therefore, 
it seems likely that further restrictions beyond stationarity must 
be imposed on a process to ensure the existence of a linear MA with 
an independently distributed innovation. Our point of view will be 
to assume the independence of the noise driving the observed pro- 
cess. Then one of the family of representations will certainly be 
independently distributed; this one will be regarded as the correct 
one, as it most completely and faithfully separates the random and 
nonrandom parts of the process. It is easily seen that the innova- 
tions of the other representations can be written as linear combina- 
tions of the i.d. one at different lags [cf. eq. (86) below] and 
are therefore dependently distributed. Although exactly equivalent 
to the correct one, these will be considered incorrect representa- 
tions because their innovations are not purely random. 

A concrete example wixl help clarify these matters. Consider 
the exponential pulse 




0 k < 0 


-bk 

e 


k > 0 


(b > 0) 


(81) 




81 


which has been invoked in astronomical shot noise models (e.g., 

Terrell and Olsen 1970). This minimum delay wavelet is the inverse 

of the simplest possible nontrivial AR filter, that is, the one- 

-h 

parameter model used as an example in §IIIf, with a = e Let P 
be an i.d. noise process, and consider the moving average .Y = R*C. 
The inverse of t’ is the dipole (1,-a). Hence the family of MA 
filters for this process has only two members, namely 

= (1, a, a 2 , a 3 , . . . ) , (82) 

and 

= (. . ., <r\ a 2 , a, 1) . (83) 

The corresponding inverses are (l,-i<) and (~u,l). The MA represen- 
tations are Y = d J *R (precisely the form used to define Y) and 
Y = *R' , where 

R’ = (-a,l)*Y = (-(7,1 ) * (1 , -c?) ~ 1 *R = P*R , (84) 

with 

P = (-u,l)*(l,-a) -1 . (85) 

The pulse F is fundamental in the algebra of dipoles: convolution 

with F of a filter that has the dipole factor (1,-a) reverses that 
factor. With the aid of Z-transforms the following explicit forms 


can be derived: 



and 


*ft"< 


(1 - ii 2 ),f * !c 5 0 

ft - 1 . (87) 

VO ft - 1 

It might be surmised from inspection of equation (86) that Rj and 
R n + 1 are correlated because they have many terms in R in common. 
However, a straightforward calculation yields 


w ■ ■ <W 


( 88 ) 


ana 


= 5 f;0 [i.e., (. . ., 0, 0, 1, 0, 0, . . .)] . (89) 

Figure 4 shows an example of processes related in this way: that 

in Figure 4(dP is independently distributed, and Figure 4(cl) is 
the same process (same realization) filtered with P. The pulse F 
is graphed in Figure 17. It is perhaps surprising to find a pulse 
other than the delta function itself which has a delta function 
autocorrelation. There are many such pulses. They are sometimes 
called all-pass filters . The filter , for arbitrary .1 of 

order V or less, has this property is the unit delay operator 
defined in eq. (41)]. [Radar design is one application where 
unautocorrelated pulses are sought (Boehmer, 1967).] Note that our 
process, constructed as randomly occurring, decaying exponential 
pulses, can also be represented as randomly occurring, growing 
exponentials! These representations are mathematically equivalent, 
as t "" *R = ; ' *:\ ' . But ; *.T is a better representation because it is 
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FIG. 17. -The pulse F = \F r ,} , described in the text, which has 
a delta-function autocorrelation. This was the pulse in the moving 
average shown in Figure A (c). 


ti same one used to construct the process in the first place. It 
is better in the sense that its innovation is independently dis- 
tributed and not merely uncorrelated, as is the innovation /?' . 

c) Deconvolution via Independentlu Distributed Itmovations 

The previous subsection assures that a MA representation 
exists. While it is not automatic that this linear superposition 
of constant pulse shapes is physically significant, it frequently 
is. That is, random processes which occur in nature often consist 
of the summation of independent pulses. Since the moving average 
model represents a process as the convolution of a pulse shape C 
with an innovation R, the process of deduc ig the model (C,R) from 
time series data is called deconvolution . The goal is to disen- 
tangle the overlapping pulses from each other, revealing the under- 
lying pulse shape and information about the amplitudes of the 
pulses. 

Most of the standard deconvolution techniques (§IVd) are based 
on least-squares modeling or the autocorrelation function and are 
therefore insensitive to the information needed to determine the 
phase character of the pulses. Such techniques cannot distinguish 
among the representations in the extended decomposition theorem. 
Further, if the driving process R is normally distributed (Gaussian) 
noise, it can be shown (Parzen 1962, §3.4) that the process .V = R*C 
is also normal and therefore completely characterized by its mean 
value and its autocorrelation function. In this case no technique 
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can recover ( -\e phase information. The pulses in an MA driven by 
Gaussian noise overlap so much that the phase information is irre- 
trievably lost. However, many physical processes are not normally 
distributed, and for these the problem arises as to how to deter- 
mine the pulse shape with the correct phase property. This is the 


FUNDAMENTAL PROBLEM: Given data sampled from the moving 

average process A' = R*C, where R is independently dis- 
tributed noise and C is a (not necessarily minimum delay) 
pulse, find estimates of the pulse shape C and amplitude 
sequence R. 


The standard techniques determine the minimum delay pulse which has 
the same autocorrelation as C. But if K is not Gaussian, the cor- 
rect pulse shape can be recovered. The key fact is that the inno - 
vation corresponding to the correct pulse is independent! v distrib- 


uted, while the other members of the family of innovations in the 
extended decomposition are not independent. 

The procedure to be described here is a direct search for an 
independently distributed innovation. We seek the model (AR, MA, 
or ARMA) which, of all models consistent with the sampled data, has 
the least dependence in the distribution of the innovation. Begin 
by writing, in terms of the data A', the innovation as a function of 
the model parameters [eqs. (15), (19), (26), and (27)]: 


4 
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A*X 


(AR model) 
(MA model) 


C" 1 ** 


R « 


I A *C~ 1 *X (ARMA model) 

A*C~ l *tf d X) (ARIMA model) 

Because of its simplicity and practicality the AR model is the 
prototype in this discussion, but the others can be treated in 
much the same way. The explicit form of R in this case is 


(90) 


P 

E u = A k X n-k (n = p + 1, p + 2, . . . , N - q) , (91) 

k=-q 

where A is of order ( p,q ): A = ( A _^ , ...» ,l_j, A Q , Aj, . . Ap) . 
Then construct a measure oi the dependence of the process R, and 
minimize it with respect to the model parameters. There is no one 
correct way of defining a suitable dependence measure. Correspond- 
ing to each of the definitions of independence given in §IIa there 
is the following quantity which could be used as a measure of the 
dependence of the process R : 


(1) F M (I‘ V r 2 , . . r M ) - ^ 1 (^ 1 )^ d 1 (^ 2 ) • • • 

using probability distributions 

(2) F^Xr v v 2 , . . r M ) - F^F^rJ . . . 

using cumulative probability functions 

(3) ^(^j> U 2’ ' ’ ‘ ~ ^ j (^ 1 ^ 1 (^2^ ‘ ‘ ' 

using characteristic functions 

( 4 ) . . . F M %)) - (^ (* > ))(<? 2 (/? 2 )) ’ • ‘ 

(q.XRj) using expectations 

* 1*1 1*1 ' 
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In these expressions a simplified notation is used to give the 
order of the statistical functions. If R were independently dis- 
tributed these four expressions would all vanish for all values of 
the appropriate independent variables (the r’s or Luc u's) or for 
all functions and for all values of the integer M. There is a 
variety of ways one might choose to estimate the statistical func- 
tions in these expressions or to assess the departure of the chosen 
expression from zero. Of these many ways of proceeding, different 
ones will undoubtedly be suitable for different kinds of problems. 
Extensive experimentation has led to one procedure which has worked 
well in a variety of test cases. This procedure is offered as a 
fairly general purpose one, but the reader may wish to consider 
other approaches to dependence minimization for his data analysis 
problems. 

How are the individual and joint probability functions in the 
above expressions to be evaluated? First, equation (91) [or, more 
generally, eq. (90)] generates R from the sampled data .V, as a 
function of the model parameters. The .esulting values of R are 
then used to estimate the function of interest, in the form of an 
average. Assume that R is ergodic , so that the desired ensemble 
averages can be computed as time averages. For example, to esti- 
mate (r ', where Q stands for F, F, or <f> , evaluate the 

aver > ’e 



N - (j 


+ 1 ) 


X-q- 1 

T 

>!=;•+ 1 


: 2 (?*,,. 


r N+l> 


(92) 
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The way in which the estimators (such as P„) are calculated is dif- 
ferent for each of the four forms (l)-(4) above and will be described 
below. Because time averages are used, no distinction can be made 
between the various second-order functions, such as $ 2 (r j, r 2 ), 

<5 2 (*% * r 3 ^» • • • > r k + Such a distinction is unnecessary, 

however, because the assumption that R is stationary means that all 
of these are equal anyway. The next step would be to consider 
third-order functions, such as (? 3 (r*j, r 2 , r 3 ), which are awkward to 
deal with numerically. Fortunately (Papoulis 1965), the added 
information by going from second to third order is contained in 
simpler expressions, such as Q 2 (r> n , r n+2 ), or in general Q 2 (r n , r n+m ) . 
The corresponding time-average is 


N-q-m 

2 ^ r n- r n+n ^'} >: ~ /y - (p + q + rn) ^ r nMr^ ’ 

n-p+l 

Hence, expressions higher than second order never need be considered. 

The final dependence measure is the sum of expressions such as 
equation (93), from rn = 1 to some maximum value, rn* [see, e.g., 
eq. (98) below]. What should this range of values be? Unless 
rn* <•' V , the small number of terms in sums such as equation (93) 
will make the estimates ill-determined. Numerical experiments of 
the kird described in §V, mostly with cumulative probability 
functions, have yielded the following: For simple models of order 

one or two, the single lag n = 1 may be sufficient in the sense 


that no further information is added by including larger lags. But 


' , -order models must be Larger than 1 if all. of the 
u ) a ibeiil the process is to be extracted from the data. 

: h. ■ : ■ uoie. for appears to be roughly equal to the number of 

1 r. ,m i : er.s, i.e., ,\ rationale for this empirical 

re--u! t is 1 ack ini’,, a l though it is not unreasonable. 

i he a ;>|)i oar lies using probability distribution functions (PDF), 
cumu i a! ive probabil itv functions, and characteristic < unctions were 
iesti.l on problems with known pule shapes end innovations. The 
dependence measure based on cumulative probability functions proved 
be far l ho best ( see : Via) , and the details of this approach will 
now be' given. A straightforward estimate of the cumulative proba- 
bility funot ion of ,7, is |cl. the del' ini t ion in eq . (7)1. 

. 7 * 

;V i (v) = j* £ ;/(.r - :•.,.) , (94) 

whore .7* = ,7 - (; + ;) and the /V have been reindexed as describe 1 
below, at tile' end of :',Ve . //(.;') is the unit step function: 

1 0 • (i 

(9b) 

1 , 0 

: be mu i in equation (94) is just the numboi of which are 

f:,i' ret ore 1 /.V* times the sum is an cost incite of /'•*{/•., _ v * . 

i . a .tep ! ur.e't ion, con s i s ! i in.’, of equal steps (of amplitude 1 /.’.’*) 

7' ' . I. of t lie . _ . S i in i 1 a r 1 v i he second-order joint c. -nil at hv 
■ A : i it ■- ! unction for lap. ■■ i estimated with 
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( 96 ) 


N*-m 

- W ^~ S > Jl Hx - - W • 

n=l 

In this expression the sum is just the number of pairs ( R n , R-n+m) 
such that 

R n £ x and R nMn £ y (97) 

[see the definition of F ^ given in eq. (5)]. The dependence mea- 
sure is taken to be 


m* m* 


Dp{A) = y ' D™{A) ''Jj \ F ™(x,y) - F^xW^y) 

dx dy . 

m=i m= 1 



(98) 


The evaluation and minimization of this expression are described 
in §V. 

An analog of equation (98) with probability distributions 
replacing probability functions is 


rri* 

R p {A) ~ P^P^y) 

m= l 


dx dy . 


(99) 


However, numerical tests have shown this dependence measure to be 
inferior to Dp. The reasons are readily understood. The estimates 
of Pj and P 2 involve the construction of intervals or bins in both 
R r - and ( R n , P n+m )-space, and then counting the number of points in 
the bins. This procedure has several difficulties. First, the 
results are considerably sensitive to the sizes and positions of 


C ~ 2. 
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the bins, and there is no obvious way to choose these optimally. 


Indeed, it appears that the optimum bins depend on the distribution 
of R, which of course is not known a priori . A second difficulty 
lies in the quantized nature of bin occupation: A sufficiently 

small change in A only moves the R points around within the bins, 
and leaves the number of points in the bins (and therefore the 
estimates of Pj and ? 2 ) unchanged. Hence the derivative of the 
penalty function. Dp, is highly discontinuous. This effect foils 
minimization methods which use gradients, and it also appears to 
produce a forest of local minima which makes the global minimum very 
elusive. The author achieved some success in alleviating these 
problems by weighting the points according to their distance from 
the bin center (a Gaussian dependence proved superior to exponential 
or linear), to remove the quantum effect. Even so, there were still 
numerous local minima in typical problems. The expression in equa- 
tion (98), because it uses cumulative functions, requires no binning 
and is a smoothly varying function of A. For low-order models it 
possesses a single minimum to which the minimizer converges rapidly, 
independently of the starting value. This result holds even with 
m* = 1. When the order of the model is larger, local minima invari- 
ably appear unless m * is increased (see §V) . 

Another problem with Dp concerns the treatment of the points 
that spill outside the chosen P-interval. Again some success was 
achieved with empirical remedies, namely the application of a 
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penalty for such spills (to be added to Dp) or defining the edges 
of the bins, in an i?-dependent way, to include the maximum and 
minimum i?-valued. But these stop-gap remedies were only partially 
successful at producing a well-behaved dependence function. It is 
also awkward to have so many adjustable parameters (number, size 
and location of bins, weighting functions, spill penalties, etc.) 
to be chosen arbitrarily or optimized using trial cases. In com- 
parison, the function Dp, given in equation (98) and evaluated as 
described in §Vd, is very well-behaved and free of undetermined 
parameters or functions. 

Thr characteristic function is intrinsically a continuous 
function of the A so the method based on these functions also 
avoids some of the problems discussed above. The first and second 


orders are 


^(w) = (exp (iuR n )^ 


( 100 ) 


ij >^ l (u l ,u 2 ) = (exp [i(Uji?„ + u 2 R n J nv ) l) 


( 101 ) 


The corresponding condition for independence is 


(7 (f) m (u 1 ,w 2 ) 5 <j> 2 m (w 1 ,M 2 ) - <t> 1 (M 1 )<t> 1 (w 2 ) = o 


( 102 ) 


for m = 1, 2, . 


. Since this function must vanish for all u. 


and « 2 , there are various expressions which could be adopted as the 
dependence measure, the most obvious being the integral 


= JJ I G^ 1 (u 1 ,u 2 ) du x du 2 . 


(103) 
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This measure, even with weighting functions thrown into the inte- 


grand, did not give very promising results and was numerically 


awkward. A. much simpler procedure comes out of the Taylor series 


expansion of the function in equation (102). Write 


00 

>i (u) oc 

k= o 


00 


du 


n 

kl 


(104) 


u=i 


and 


m 


2 (w lf « 2 ) ^ ^ 

k, ,7=0 


"fc+J . m. . 
3 ^ <^ 2 (u lf u 2 ) 




Z t J li 

~k\j\ 


u l = u 2 -Q 


(105) 


The quantities in square brackets are i^y^ and ., respec- 

tively, where the y's ar ■: the moments 


k ' 60 


and 


m 


' k, j 


(040 ■ 


Since all power.: of u j and u 2 must vanish in equation (102), 


m 


U k,j Vj 

for all k, and m. Accordingly, the expression 


D, 




mom 


(y. . - y.y -)ic'- W) 

t / < J 


(106) 


(107) 


(108) 


(109) 


'2,2.1 

m k, 

can be taken to represent the degree of dependence of the process R 


\t-'(k,j) is a weighting function]. For simple i.odels the single 
value m = 1, and just a few terms K,,f = 1,2, seem to suffice. The 

■)h 


term k ■ j ■ 1 corresponds to the autocorrelation function, and the 
terms k ■ 1, j ■ 2, and k * 2, j = 1 are related to the "time skew- 
ness function" of Frenkiel and Klebanoff (1967) applied to a related 
problem by Weisskopf , Sutherland, Katz, and Canizares (1978). The 
numerical tests showed that moments and characteristic functions 
have some merit for this problem, but again local minima were 
bothersome and no choice of weights for the u ' s or the p's c mid be 
found that yielded consistently satisfactory results. 

The author has not experimented with expectations of arbitrary 
functions [method (4) in the list, above), mostly because the infi- 
nite arbitrariness in choosing the function sets is so imposing. 

Finally, while it would not necessarily yield independently 
distributed innovations, a procedure based on maximizing the mar- 
tingale difference property (§IIa) was considered. In fact, the 
implementation is straightforward and easy. Select a set of iv-bins, 

denoted , and then evaluate the conditional expectation value 

& 



n such that 7?, ;+w <?<$»,• 


where. f is the number of n such tha; $ & .• . The "mart ingaleness" 

measure would then be 


D MDP " Ti ^n^'n+T! ~ ( : 'n) J » 

i 


( 111 ) 
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where 



(112) 


is just the (unconditional) expectation value of R . As will be 
seen in SVIa this procedure does not appear to be very effective. 


d) Predictive Deconvolution of Time Series 

Predictive deconvolution (Peacock and Treitel 1969) or predic- 
tive decomposition (Robinson 1967b) refer to the use of linear 
prediction (Makhoul 1975) . usually based on past data only, to 
yield information which allows representation of a process in terms 
of elementary building blocks (such as white noise processes, MA or 
AR filters, and deterministic processes). Since least-squares 
methods are almost always used, and these cannot recover phase 
information, only a brief sketch will be given. This discussion is 
intended to clarify the relation of predictive techniques to the 
material presented above, and also to motivate a technique (SIVe) 
which is a simple extension of linear least-squares prediction and 
which can recover pulse phase information. More details than are 
given here can be found in an extensive literature (Kolmogorov 1941; 
Mann and Wald 1943; Wiener 1949; Bode and Shannon 1950; Durbin 1959, 
1960; Walker 1962; Robinson 1964b; Gersch 1970; Akalke 1971, 1974; 
Chow 1972a; Kashyap 1974; Shlnners 1974; Astrom and SSderstrbm 
1974; Gertler and Barryasz 1974; Gersch and Foutch 1974; Graupe, 
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Krause, and Moore 1975; Tong 1975, 1976 — to name a few), and 
expecially the reviews by Robinson (1967a) and Box and Jenkins 
(1970). The reader interested in the new techniques only should 
skip to §V at this point. 

The basic principle of predictive decomposition is that a 
model which gives good predictions of the behavior of a process 
undoubtedly is a good representation of the process. Thus one takes 
a model with a simple structure and adjusts it (by adjusting the 
values of the model parameters) until some measure of the error the 
model makes when tested against the available time series data is 
minimized. This procedure is called optimizing the model. The 
goal is not prediction per se , but representation of the statisti- 
cal properties of the process. The hope is that the optimization 
will extract all of the Information about the process that is con- 
tained in the data at hand. 

The basics of the predictive approach are as follows. The 
term linear prediction used above simply means that the predictor 
is taken in the form 

x n " B i x n-\ + B z x n-2 + • * • + B ] i x n .] < ( 113 ) 

(cf. the autoregressive memory discussed in Hid and in the proof 
of the Wold Decomposition theorem in SlVa).^ That is, this 

++ The caret C) is placed over quantities which are estimated 
or predicted, based on data and (usually) a set of parameters such 
as the B j. It is to be distinguished from the symbol < > for the 
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expected value , which is a statistical average, depending on the 
whole process (theoretical expectation) or on a realization of it 
(sample expectation). 


expression is to be used to predict!! the value of X n , based on 


!!lt should be emphasized that the word "predict" is not meant 
in the literal sense, as it would be, for example, if we were 
Interested in real-time analysis of a manufacturing process we 
wished to control. Rather, we consider X n to be the guess or esti- 
mate we would make for the value of X n if we didn't know it, based 
on knowledge of values of X at other times. Conventionally, the 
restriction to the use of past data is imposed, but in general use 
can be made of past and future. (A two-sided prediction-error 
filter is sometimes called an interpolation operator.) 


knowledge of the previous values X n _^ t * n _ 2 » • • • only. The num- 
bers B. are related to the AR parameters and are to be determined 
by minimizing the prediction errors, in a sense to be defined. The 
error in prediction at time n is 

k k 

E n - *n - K - x n * Z Vn-i ' Z A i X n-i • < 114 > 

■£■1 £»o 

where we have taken A. ■ -5. and A n - 1. In other words 

t % U 

E - A*X (115) 
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is the sequence of prediction errors as a function of time, and for 
this reason A is sometimes called a prediction-error filter. Sup- 



I 




I'. 




\ 


H 

i: 


pose we take the sum of the squares of the prediction errors, 
that is 

E(A) -^E n z (116) 

as the measure of the errors which is to be minimized. In practice 
the length of the prediction filter is taken to be much less than 
the length of data ( k « N), so that a large number (It - k) of trial 
predictions can be evaluated. The minimization equations are 

■jq: - 0 (t - 1, 2, 3, . . .) , (117) 

or 



the expectation value of which is 


(118) 



0 (i - 1, 2, 3, 


.) 




(119) 


where p is the autocorrelation function. These are the standard 
Yule-Walker equations (Ulrych and Bishop 1975). The procedure is 
to use the data to compute an estimate of p, then solve equa- 
tion (119) for the coefficients .... Ulrych and Bishop 

give specifics and FORTRAN programs for carrying out this solution. 
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From the solution for A and ths dsta* it is straightforward to 
ealculats an estimate of ths innovation fron the relation in aqua- 


tion (115). Indaad , the sequence of prediction errors E n corre- 
sponding to the optimum A is an estimate of the innovation R. That 
is, R is both the sequence of optimum prediction errors and the 
sequence of pulse amplitudes. This equivalence can be understood 
by noting that, with the correct A , there is no prediction error 
at time n due to pulses starting before n — the error is totally 
due to the new pulse, of amplitude R n , hence E n m R n . This es ti- 
me te, of course, is of the innovation corresponding to the specific 
realisation of the processes which has been sampled , but 
therein is also contained information about averaged quantities, 
such as the pulee rate (which for a continuous distribution of 
anplitudes is expressible in the distribution function of pulse 
amplitudes) . The Yule-Walker equations can be generalised to the 
case of two-sided filters, but this is a useless exercise providing 
no added information. 

A procedure for determining A with the minimum delay condition 
imposed is due to Burg (1968, 1975) and is discussed by Ulrych and 
Bishop (1975), Fahlman and Ulrych (1975), Kanasewich (1975, Ch. 16), 
Ulrych and Clayton (1976), and others. The sum of the squares of 
the forward and backward prediction errors of & one-sided prediction- 
error filter, namely 
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B p “ 2 {If - p) 


£ 

n«p+i 



( 120 ) 


L\fe-o 

is minimized with respect to A. The first term inside the braces 
corresponds to the error made by the filter in predicting X n based 
on the p preceding values X n _ 2 , . . ., ^ n .p* Since least- 

squares modeling cannot distinguish one sense of the direction of 
time from the other, Burg Introduced the idea that one should 
include the backwards predictions, which are represented by the 
second term in equation (120). This term is the sum of the squares 
of the postdiction errors, made by the same filter (reversed) based 
on the subsequent values £ n+1 , ^^+2* • • •» *nvp’ ten8B the 

backward and forvard contributions to Ep , when expanded out, are 
ldeutical except for end effects. Thus burg's idea is most impor- 
tant for short segments of data for which end effects are most 
importaai. This procedure explicitly assumes that the process X is 
intrinsically symmetric, in that forwards and backwards predict lens 
need not be distinguished, and of course this is not generally true. 

The limits of the n-sum are chosen such that no datum outside 
the sample range, n • 1, 2, . . . , E % is ever called for - that is 
to say, the estimate is noncommittal about the unsampled data. (In 
some formulations of such problems the unsampled data is set to 
zero.) Therefore the resulting parameter values are "maximum 
entropy" estimates (Burg 1968; Lacoss 1971; Ulrych 1972; Abies 
1974). Hence A can be used to compute "n estimate of the power 
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spectrum £eq. (69)] of X (Burg 1967; Aka lkc 1969a, b, 1970b; Parcen 
1968, 1969) which ia called a maximum entropy method (MEM) epectrum. 
The nature of the predlctlone and the rangea of the summations are 
depicted in Figure 18. Details of the method are given by Ulrych 
and Bishop (1975) and more completely by Andersen (1974). The first 
of these references describes a convenient recursive solution to 
this least-squares problem, which imposes the minimum delay condi- 
tion explicitly at each step. This is the Levinson (1947) recur- 
sion, also discussed by Durbin (1960) and Burg (1975). Ulrych and 
0 Bishop discuss various practical matters, give a FORTRAN program 

for the determination of the AR coefficients as well as the spec- 
trum, and outline the use of the final-prediction-error (FPE) cri- 
terion for the determination of the length of the (one-sided) AR 
filter. 

This procedure is very efficient at determining the AR coeffi- 
cients from time series data generated by simple processes where 
there is little noise present. It should probably be used if it is 
known a priori that the pulse is minimum delay. In astronomy this 
is seldom the case. 

e) Predictive Deconvolution with the Absolute Value Nom 

The choice of the sum-of-squares of the errors, In equa- 
tions (116) and (120), is not the only possibility. Least-squares 
modeling is used because it gives maximum likelihood parameter 
estimates (Box and Jenkins 1970). It is also convenient because of 
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FIG. 18. -A prediction filter can be depicted ee a string 
of numbers which is to be overlaid on the time aeries to obtain 
the cross product X„ * 5? A»JL j., which la the predicted value oi 



the simplicity with which the minimisation can be expressed in 
terms of the autocorrelation function [eq. (119)]. But some other 
measure of the errors could be substituted for the mean square 
error. The AR parameters could be determined by minimizing the 
more general form E(A) * ||£ > || , where ||ff || denotes an arbitrary 
error norm. §§ For example , consider the L 0 norm: 


^Random processes can be considered as elements of a normed 
linear space (for L 2 thls is a Hilbert space with the inner product 
(X,Y) > tfY>) . A norm satisfies the three conditions: 

(a) ||y|| - 0 if and only if X - 0; (b) - |a| ||x||; and 

(c) \\X + y || £ 11*11 + || y 1 1 . These are pleasant but not necessary 
properties for a measure of the errors or residuals in model fit- 
ting. For example, the skew "norm" of Claerbout and Muir (1973) 
does not satisfy (b) or (c), but it is still a useful penalty 
function for residuals. 


v e > - (zi £ »r) • (m) 

The mean square error corresponds to a * 2. The usefulness of the 
choice a * 1 (Claerbout and Muir 1S/2; Scargle 1977) will now be 
demonstrated. Consider the MA process X * R*C t where f? is an inde- 
pendently distributed process and C is the two-point pulse (l,o); 
if \o\ < 1 C is minimum delay, and if |o| > 1 C is maximum delay. 
Introduce a two-point forward prediction-error filter A * (1 ,a) : 
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* 


( 122 ) 


B n " x n + 

i.e. , tho fora of the prediction is simply 

" -«*«- l • (123) 

The best value of a minimises £ a (£), which is equivalent to 
ainiaising 

r. /s»\ 1° - I V J- 

n-1 


n 1 

* 2C|<*n + °Vi> + fl (Vi + aH n- 2 ) 
n ' 

- £k + (a + o)/? n _i + aeff n- 2 | 


(124) 

(125) 

(126) 


This last expression is difficult to deal with because of the pulse 
overlap manifested in its three terms. But progress can be made if 
the pulse overlap is neglected, because its effects should average 
out. The prediction-error due to a single, isolated pulse at time 
n is 


E f - |* n r(l + \a + o\ a + |oc|°) . ( 127 ) 

But also consider the reversed or backward prediction-error filter 
4 - (a,l) , which leads to the error 

E b - |/? n | a (|a| a + |l + ao |° + M«) . (128) 

It happens that E f ■ if and only if a ■ 2. That is, least- 
squares prediction is identical in the forward and backward direc- 
tions and would yield the same result if the time series were 
reversed. The quantities Ef and E ^ can be easily minimized if 
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a 1. For a ■ 1 consider the graph of |a + o\ + |ac| in Figure 19. 
Each of the two terms is a simple absolute value curve with a slope 
discontinuity at the point where its argument is zero. Hence the 
sum is piecewise linear, with vertices at these zeros. The minimum 
must fall at one of the vertices,^ and simple comparison of the 

^ If 1 1 ? | ■ 1, the line between the vertices is horizontal and 
the minimum occurs everywhere along this line. This degenerate 
case is not important, because such a pulse has no stable AR repre- 
sentation anyway. It should be remembered, however, that absolute 
value minima are not always unique. 


two values shows that 


if 


a 


ft min 


\o\ < 1 


M > l 


(a - 1) . (129) 


For « 1 


a ft m^ " + [with 0 = (a - l) -1 ] , (130) 

and a similar analysis of the backward case gives 

+ M 84-1 ) -1 (a f* 1) 

0 

The values at minimum are related as follows: 

E f (a f,mln> " 1 + H 2 < V^.min* " 1 + W U H < 1 5 

(132) 


o,min 


if \o\ < 1 \ 
if \o\ > 1 I 


(a » 1) 


(131) 


ii 
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and 


V a i>,mln> ■ W'' + 1*1 ' V'Y.min 5 ' 1 + |o1 


if |o| > 1 . 

(133) 

Hence the minimum for either forward or backward prediction is at 

-a if |<?| < 1 

-o~ l if | c? | > 1 

which gives for the optimum A 


a, 


min 


(o - 1), 


(134) 


I a ,-c) ^ M < i 

(-c- 1 ,!) if \o\ > 1 


(a - 1). 


(135) 


(136) 


These solutions, for all values of a, are to be compared to the 
exact inverse of the pulse from which the process was formed, 

1 (1 ,-c,.j 2 ,-e 3 , . . •) if M < 1 

(. . .,-o _3 ,c -2 ,-£? _1 ,l) if |c| > 1 

The two-term L j solution in equation (135) agrees with the first 

two terms of this exact result. For |a| « 1 the filter 

(l,a- . ) - [1,-a + o(a 2 )] is approximately correct for any a > 1, 

j ,min 

and similarly for |a| >> 1 the filter ( a ^ m ^ n >l) “ [-a -1 + o(a _2 )»l] 
is a good approximation. The inequalities in equations (132) 
and (133) hold for 1 < a < 2, but the opposite sense inequalities 
hold for a > 2 (with equality for a - 2, as already noted). Thus 
any L a norm with 1 g a < 2 makes the correct decision between mini- 
mum delay and maximum delay, but a £ 2 is unsatisfactory. The best 
choice is a ■ 1, for at least in this example the resulting param- 
eter values are then most accurate. 
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This demonstration of the phase determining ability of the 
absolute value norm is for the simple case of a first order AR 
process. More general cases are difficult to treat analytically, 
but there are many good numerical techniques foT absolute value 
minimization (Barrodale and Roberts 1973, 1974; Osborne and Watson 
1971; Barrodale, Roberts, and Hunt 1970; Barrodale and Young 1966; 
Robers and Ben- Israel 1969; Barrodale 1970; Ekblom and Henriksson 
1969; Rice and White 1964; Maria and Fahmy 1974; and Claerbout and 
Muir 1973). Numerical tests (Scargle 1977, and SVIa) show that the 
L j norm does work for more complicated cases, as long as the driv- 
ing process R is at least moderately nonnormal. But a difficulty 
arises when two-sided filters are introduced, as they must be for 
this problem. 

In the above example, permitting either forward or backward 
prediction was crucial to the phase determination. In more compli- 
cated cases, for example when the pulse is mixed delay, the obvious 
generalization is to allow A to be two-sided. For example, if 
A • (a,l,b) t the predictor is 

*n - -«**+l - • (13?) 

and the prediction-error sequence is 

E n - x n - £ n - oA^ + x n + bi r n _j (E - A*X ) . (138) 

It is here that the liberal interpretation of the word "prediction" 
noted above first comes into play. The general forms are 

A * . . • , A|, 4 . • . , Ap) (139) 
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and 


P 

-n ‘ 2 

k m -q 


A k X n-k ' 


(140) 


The corresponding mean error in the L a norm is 

N-q 

*«> - s - p - „ 22 i g w i° • «*» 

n m p+l 

where as usual the sum is such that the filter A never extends out- 
side the sampled data (cf. Fig. 18). The optimization problem is 
to find the minimum of this expression with respect to the param- 
eters ...» Ay. It seems natural to not regard Aq as a free 

parameter, but to fix it at the value 1 because of the special 
nature of the prediction point. The condition Aq ■ 1 can be thought 
of as a normalization condition imposed on A to avoid the trivial 
minimum at A . - 0, all i. However, this normalization choice is 
inappropriate for two-sided deconvolution problems. Consider the 
MA process X ■ R*C, where C is some particular two-sided pulse. 

The mean Z, 2 prediction error is 


E{A) 


s-q r p y 

n » p + 1 L >- <7 


(142) 


and therefore 


3 E(A) m 


> A i 


if - p - q La 


n Lfc 


(for i + 0) . 

(143) 
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Nov insert the desired solution A ■ C“ l : 


dE(A) 


A-C* 1 


N - p 




(144) 


■ JT jr, Ev«+* • <U5) 

n fc 


the expectation value of which is 





N - p 


- a ) i) j ^n R j) C n-i-k 


n 


■ V c - f 




(146) 

(147) 


since the J?'s are mutually uncorrelated. This expression is not 
zero unless C_^ vanishes, so that in general the A which is the 
correct inverse pulse, namely C -1 , does not solve the optimization 
problem with the constraint Aq ■ 1. On the other hand, if only 
one-sided pulses are allowed ■ 0 for all i > 0, and the desired 
A does make the above derivatives zero; this A does solve the opti- 
mization problem. The choice ■ 1 is correct for causal pulses 
but not for two-sided ones. 

What can be done to ensure that the solution of the minimiza- 
tion problem is the correct inverse pulse? If is an arbitrary 
function of the other A ' s, rather than held constant, the above 
analysis yields, instead of equation (147), the set of equations 

U n 

c -i + c 0 aj: * 0 (* * °> • 048) 
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which Integrates to 


EV-i - 1 <»«»> 

v 

as a necessary condition that optimization of A yield the inverse 
of C. (Note that for causal filters this reduces to A^C^ ■ 1, and 
the conventional constraints ^4 0 ■ Cq ■ 1 are correct.) Unfortun- 
ately, equation (149) cannot be considered as a simple constraint 
on A because it involves the unknowns C^. An obvious possible 
remedy is to compute iteratively, starting with a guess for C, 
imposing the constraint in equation (149) on the minimization to 
produce a new A and a new C “ A” 1 . The convergence and uniqueness 
properties of this iteration have been studied in numerous simple 
cases. For low>order processes with little noise it converges 
very rapidly to <* unique minimum which is very much better than the 
solution with Aq ■ 1. But for more difficult problems there tend 
to be oscillations. In some cases these can be damped out very 
effectively by adopting a suitable averaging scheme for the update 
of A . But a way has not been found to predict ahead of time which 
of several such averaging procedures will succeed on a given set of 
data, nor a single procedure that is successful on all data. 
[Curiously, although the above derivation is for L 2 , the same results 
can be demonstrated for L j using the methods of Rice (1964).] 

An interesting feature of the above iteration is that, since 
none of the A's is constrained to equal 1, the identification of the 
prediction point becomes vague. Indeed the very concept of a 
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specific point singled out as the prediction point loses much of 
its significance, bit let us cell the element of A largest in 
absolute value the prediction point, just because it is often true 
with the constraint 4 q ■ 1 that |j 4^| < 1 for all i 4 0. It is found 
that as the iteration proceeds this point is not fixed but moves 
around within the filter and eventually converges to a fixed point. 
This is favorable as it eliminates what would otherwise be an arbi- 
trary parameter (the location of the prediction point, or MPT in 
the terminology in the appendix). The length of A , however, is 
still arbitrary. For least-squares problems Akaike (1970a) has shown 
how the length can be determined in an objective, automatic way, 
based on the FPE criterion. This technique Introduces the quantity 

»*«f ' V • < 15 °> 

where is the sum-of-squares of the residuals (i.e., of the 
innovation), N is the number of data points, and M is the number of 
free parameters in the model (including one for the mean value if 
this has been subtracted from the data before analysis). Starting 
from small values, M is increased until the FPE (for "final pre- 
diction error") stops decreasing and begins to Increase. One can 
Interpret the factor (N + M)/(N - AO as the statistical penalty 
that should be paid for using more free parameters. Without such 
a penalty, the residuals would always decrease as the number of 
parameters increases, so that the FPE going through a minimum is the 
signal that diminishing returns has set in. Other techniques have 
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been proposed (see especially Gray, Kelley, and Mclntlre 1977), but 
none seems easily generalized to the Lj case. However, empirically 
It has proven satisfactory to simply replace S^ 2 in equation (150) 
with Jy l , the sum of the absolute value residuals. No theoretical 
justification for this procedure has been found, and it should be 
regarded as an empirical result with quite meagre support in numeri- 
cal experiments. Short of using this, the magnitude of the resid- 
uals and the values of the model parameters must be inspected as the 
complexity of the model increases. It has been said that so ouch 
judgment is necessary in such matters that the procedure should not 
be attempted for the first time (Granger and Newbold 1977). This 

| 

seems extreme, but some limit must be placed on the order of the 
model to avoid the pitfalls of fitting too many parameters. 

This section concludes with the one analytical result uncovered 
for the problem which is as close as possible to showing that 
absolute-value optimization of a two-sided AR filter yields the 
correct deconvolution of an MA process driven by independently dis- 
tributed noise. First the following lemma is established: 

LEMMA: If X and Y are zero-mean, independently distributed 

processes, then 

(|* + r|) g max((U|), (\y\)) , (151) 

with equality if X or Y is the null process. 



First note that if X and X are independent 

(j* + X\) - jjdx dyP 2 (x,y)\x + y\ - JJdx dyP x (x)Py(y)\x + j,|(152) 


(* + y)P x (x)Py(y)dy + ^~ (* + y)P x te)Py(y)dy 


(153) 


j" d»j^2 J (x + y)i x (x)Py(y)dy + j£(x + y)P x (x)P y 
+ j dx|j j (x+ y)P x (x)Py{y)<3y + 2j* (x + y)P ;f (x)P y (j/)j . 


(154) 


The first and last of the four terms in this equation can be written 
as the integral of a nonnegative quantity, as follows 


|T*£> 

£*£ 


+ y\P x (x)Py(y)dy 


x + y\P x (x)Py(x)dy g 0 . (155) 


In fact this quantity vanishes only in degenerate cases, the most 
important ones being P y or P y = 0. The second and third terms in 
equation (154) simplify: 
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<t*+*l> - r dx[xP x (*) + <r>P x te) J + J° dxt~xP x (g) - <r>P x (*) ] + Q 


( 1 * 1 ) + <*> j aign(x)P y (x)dx + Q 


(156) 

(157) 

(158) 


- (|*|) + Q (aince <*> - 0) , 
and ao except in the degenerate caaee in which Q ■ 0 

(|* + 7|) fc (|*|) . (159) 

Since * and * are interchangeable in the above analyeis the reeult 
•tated in the leana follow*. 

Turning to the main issue, consider the process * • R*C\ we 
wish to show that <|i4**|> is ninlwua if A • C~ l [subject to the 
condition in eq. (149)]. Write 

A ■ + 6A , 


so that 


or 


where 


A*X - (C- 1 + 6A)*C*R 
- R + ( 6A*C)*R 

* *» * ]£“**„-* 


v > 

m 


(160) 

(161) 

(162) 

(163) 


(164) 
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•ad filiation (149) gives 


Hsnes 


a 0 - 0 . 



(165) 


(166) 


But sines R is indepar-dently distributed, the sum in this equstlon 
is distributed independently of R n , end the leans spplies, to re 


(I 1***1 > i <|JT|> , (. ,) 


with equality if ■ 0 for sll k, s condition equivalent to 
6 A^ m 0 for all k. Hence A ■ C~ l gives s minimum (not necessary 
unique), and we have established the 


THEOREM: If X ■ R*C, with R independently distributed 
noise, then A ■ C" 1 is s solution of the optlaisetion 
problem min(|4**|) subject to ■ 1. 


It must be cautioned that this minimization problem is not speci- 
fied in the usual way, because the constraint explicitly Involves 
the solution, and the theorem is to be understood in the sense indl' 
cated in its proof. The practical value of this result is in the 
iterative method which it leads to, as described earlier. 
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V. COMPUTATIONAL METHODS I 

I 

The goal of this section is to provide enough computational 3 

details so that the reader can apply the techniques described above 
to his own data. In outline all of the methods proceed in the same 
way, as follows: 

1. Obtain the d&ta. 

2 . Decide on the form of the model (AR, MA, ARMA, ARIMA, . . . ) . 

3. Provide a way of computing the innovation R as a function 
of the model parameters. 

4. Choose the property of R to be minimized, and provide a 
scheme for evaluating the corresponding norm D(R ). 

5. Minimize D(R) with respect to the model parameters. 

6. Compute t le physically interesting quantities from the 
optimum model found in the previous step. 

The following subsections explain these steps in turn. 

a) Sampling (Step 1) 

Assume that the sampling is in even intervals of the indepen- 
dent variable (time, position, wavelength, . . .) so we have a set 
of measured numbers X n , n ■ 1, 2, . . . , N. This is not a funda- 
mental limitation, however, as the techniques described here can be 
readily generalized to data with gaps and/or uneven sampling (§Vg). 
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b) Identification (Step 8) 


Choosing the best form (which is traditionally called identi- 
fication ) of the model is not always straightforward, and there is 
a large and complex literature on this problem (see, e.g., Box and 
Jenkins 1970; Parzen 1974; or Granger and Newbold 1977). Summarisa- 
tion of the ideas in this literature will not be attempted, but the 
following general comments are appropriate. Many astronomical time 
series can be well represented as low-order AR processes, and this 
discussion therefore emphasises AR models. Remember that a given 
process can be represented in a variety of ways (Slle), so identi- 
fication should not be viewed as finding the True Model, but as 
finding a simple, physically suggestive model which adequately 
represents the observations. Also keep in mind that this step is 
not irrevocable, once taken. Rather, the results of subsequent 
steps often suggest some revision in the form of the model. 

o) Computing the Innovation R(A) (Step 3) 

The relation used depends on the form of the model [eq. (90)]. 
For an ARIMA model, the data are differenced d times and then an 
ARM A model is fit. The most direct way of computing R is to carry 
out the operation in equation (90) with the discrete Fourier 
transform: 

R - . ( 168) 
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Note that C enters this calculation effectively as its inverse, so 
that even here the MA part of the model is converted into an auto- 
regressive representation. The only points that are not straight- 
forward in implementing this expression with the DFT are: complex 

arithmetic mist be used in the multiplications and divisions, and 
the arrays A and C must be aero-extended to the same length as the 
data before applying the transformation. It would seem that the 
result would be of the same length (i.e., N points), but to avoid 
spurious end effects the array R must be truncated somewhat, 
depending on the length of A and C, These end effects arise 
whether the Innovation is calculated with the DFT or directly eval- 
uated with a summation [cf., eq. (91) for the pure AR case]. In 
either case the innovation is defined at slightly fewer than N 
points. This is the reason for the limits NlD and N2D in the 
FORTRAN code provided in the appendix. For the pure AR case, R is 
defined at p + q (- the length of the AR filter-1) fewer than N 
points. But the values are not NlD ■ p + 1 and N2D ■ N - q, as 
would be expected from equation (91), simply because negative 
values of indices are not permitted in FORTRAN. In the code in the 
appendix R is computed as outlined above, using the DFT. Alterna- 
tively, the sum in equation (91) may be directly evaluated; for 
small values of p and q this procedure is faster than the use of 
the DFT. However, the evaluation of R is a minor part of the com- 
putation of D. For convenience the R n are re-indexed as 
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(J? n , n ■ 1,2, . . ., tf*}, where n ■ 1 corresponds to n ■ p + 1 in 
equation (91), and N* ■ N - (p + q). It Is important that A be pro- 
hibited from running over the ends of the data (see Fig. 18), to 
avoid the numerically harmful end effects (i.e., to preserve the 
"maximum entropy" condition, §IVd). 

d) The Computation of D_(R) (Step 4) 

r 

The choices for the property of R to be minimized include 
dependence (SIVc), the martingale difference property (SIVc), the 
mean-square prediction error (SlVd), and the mean absolute predic- 
tion error (§IVe). Another example is a measure of simplicity 
called the varimax norm (Kaiser 1958; Wiggins 1977; Ooe and Ulrych 
1979). In turn there are several ways to implement each of these. 
For example, we saw above that dependence could be measured in 
terms of differential or cumulative probability distributions, 
moments, characteristic functions, or expectations of arbitrary 
functions. Since the scheme Involving cumulative distribution 
functions proved much the most satisfactory, details of the other 
approaches have been omitted. The remarks about them in §IV 
should enable the interested reader to construct algorithms imple- 
menting the other approaches. Test results with all of the methods 
save those using moments (messy) and expectations of arbitrary 
functions (not tested) , will be given in §VI for comparison. 

The function to be minimized is defined in equation (98). 





1 




and F 2 m [eq. (96)], this Integral can be evaluated exactly with a 
finite double sum, as we shall now see. It is convenient to intro- 
duce an ordered version of the R n ; i.e., define an index transfor- 
mation i ■ f(n) such that if R 1 * R\ . ■ J? , then the R\ form an 

% j\n) n t 

ordered set: 


- ■ ■ sHt-ii**, • («») 

As long as Rf^ is associated with its correct neighbor in the 
unordered '*■, namely 7? n , then the integrand in equation (98) is 
unchanged by his ordering. The integral may be written as 


,*-m lf*-m 

'p m < A) - £ - F ^n )F i (R p 


AT?'. A/?'. , (170) 

* J 


^ m l J-l 


where AT?'. - R \ - R\. This sum is over a two-dimensional (unevenly 

t- t+i t 

spaced) grid of rectangles with area AT?'. AT?', and with edges at the 

values R \ , i * 1, 2, . . . , N* (see Fig. 39 in the appendix). From 

% 

definitions (94) and (96) it can be seen that F 9 (R\,R \) , F.(7?'.), 

and F.(R\) are all constant over each of these rectangles and 
1 0 

therefore so is the summand, F™ - Fj • F . Hence the sum in equa- 
tion (170) is an exact evaluation of equation (98). Of course, the 
expressions for F 1 and F™ are inexact estimates of the correspond- 
ing quantities. However, they exactly represent all the informa- 
tion contained in the given realization ot R — this is not true of 
the estimates of Pj and P 2 , since there is always some loss of 
information in a binned histogram. This is probably the main rea- 
son for the superiority of the c.d.f. approach. The advantage of 




122 


A', the ordered version of A, is that the summand can be computed 
recursively* for example, with 

- '"<**.,•*)> * Tjprrs) »[*} - "rfo J ' (171) 

[A is the step function defined in eq. (95).] This relation follows 

from the fact that no more than one new step in F 2 m begins at a 

given value of A'., corresponding to a given row in the matrix 

(A*. A'.). Further discussion of this recursion is in the appendix, 
l 0 

e) Minimisation of D p (A) (Step S) 

The minimum of Dp, with respect to the filter elements A, can 
be found with any of several standard numerical techniques; the 
simplex method is described here because it is the one the author 
happened to use, not because it has been proven to be more suited 
to this problem. The following warning should be issued with the 
simplex method (Nelder and Mead 1965; Powell 1965): After the con- 

vergence criteria have been satisfied, a restart should be made to 
check the possibility that the simplex has become degenerate or is 
otherwise unable to progress toward the true minimum. A restart 
is a reinitiation of the Iteration with a new simplex at the point 
to which the procedure appears to have converged (see the appendix). 
Another caution is that D p may have more than one local minimum. 
With numerical techniques it is never possible to be certain that 
the global minimum has been achieved. But the expression for Dp in 
equation (98) is far superior in this regard to all of the other 
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methods tried and to other ways of estimating the cumulative proba- 
bility functions. For data generated by a simple process and fit 
by simple models [e.g. , AR(1,1)], has never been found to have 
more than one minimum, and the simplex rapidly converges to the 
(globed) minimum from essentially any starting value. The conver- 
gence Is also very sure in that a restart is never needed. (Never- 
theless it is wise to try a restart in all cases, even if it is 
expected that it will not be fruitful.) As the order of the fitted 
model is increased three symptoms eventually appear: 

(1) local minima abound 

(2) restarts are frequently necessary (i.e., false convergence 
becomes common) 

and, not surprisingly, 

(3) convergence is generally slower. 

Experiments have shown that the first two of these problems are 
eliminated if m* is Increased sufficiently, typically to a value 
slightly less than p + q. Because the time to evaluate D p is 
roughly proportional to m*, the computation time increases as m* 
is increased, but the reward in sureness of convergence, elimination 
of spurious local minima, and accuracy of the solution is certainly 
worth the price. For a given data set, the procedure found to be 
best is as follows: 
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(A) Fit a very low order model , such as AR(1,1), with 
m* ■ 1. Unique and sure convergence has always obtained 
at this step, but caution suggests that one: 

(a) experiment with a variety of starting values, 
such as A “ (0,1,0), (1,1,1), or the (a.j.l.aj) 
which gives the L j minimum (i.e., minimum of 

H 

(b) try a variety of sizes for the initial simplex, and 

(c) always try restarts, with moderately large simplexes.| 
Hopefully these steps will not be necessary, and the results 
will be the same for all starting solutions and simplexes. 
However, since ill-conditioning tends to grow with the 
model complexity, confidence in the good behavior of the 
procedure at this stage is essential. If there are con- 
vergence or uniqueness problems at this early stage, there 
are several possibilities: 

(i) The process is not stationary, and V should be 
applied one or more times before modeling is 
attempted, 

(ii) an even simpler model should be used to start with, 
such as AR(0,1) or AR(1,0), 

(iii) a totally different form should be tried, such as 
MA or ARMA, or 

(iv) the value of m* should be increased (see D below). 














Determining the correct order of the model is important. If 
the order is tsksn too small • there will be residual serial corre- 
lation in the estimated innovation, indicating that not all of the 
Information about the process has been extracted. In spectral 
analysis the symptom of too small an order is that the spectrum is 
heavily smoothed — the frequency resolution has been degraded by 
using too few parameters. In deconvolution the pulse shape is 
similarly over-smoothed. In principle, taking the order too large 
is not as harmful because the extra parameters will be very small 
(provided there is enough data). In practice, however, even a few 
too many parameters cause numerical difficulties and add greatly 
to the cost of the computations. If the number of parameters 
becomes of order S (Heaven forbid!) the estimates all become 
unstable because there are too few terms in the corresponding sums. 
In general too many parameters show up as large spurious spikes in 
the power spectrum, or as wild oscillations or other erratic 
behavior in the pulses. There are many approaches to the order 
problem in the classical least-squares arena (e.g., Chow 1972a, b; 
Anderson 1963; Jenkins and Watts 1969; Akaike 1970a; Gailbralth 
1971; Lindberger 1972; Parzen 1974; Jones 1974; Graupe, Krause, 
and Moore 1975; and Tong 1975). Also, an Innovative approach has 
been developed by Gray, Kelly, and Mclntire 1977. It is not sur- 
prising that the same difficulties confront modeling with indepen- 
dently distributed innovations, as the models are Identical. The 
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steps (A)-(D) above, based on experience with both test cases and 
real data, are offered as guidelines only. It is hoped that an 
objective technique, such as the FPE (see fIVe) can be developed. 
Toward this goal, the quantity in equation (150), with 
replaced by D^ t is routinely tabulated (see fVI). In some cases 
this quantity can be helpful in deciding when the order is correct, 
but it is far fron infallible. When using the suggestion given 
above [step (B)] for increasing the order of the model, the FPE 
will be systematically underestimated, because the smaller of two 
values of D , corresponding to the two choices for the location of 
the new parameter, is selected. This could cause the quasi-FPE 
criterion to overestimate the order of the model, as occurs in the 
examples in fVI. 

A note about multiple minima: For a given total order [e.g. , 

p + q for the model AR(p,q)] there will be distinct minima for each 
of the possible choices of p and q. [For example, if p + q ■ 3, 
the four possibilities are AR(0,3), AR(1,2), AR(2,1), and AR(3,0).] 
With the current algorithm the prediction point cannot move during 
the minimisation, so that all of these choices are separate prob- 
lems. It would be helpful if a scheme to allow automatic migration 
of the prediction point could be developed, as with the minimisa- 
tion with a pseudo-constraint (five). Then all of these problems 
(with a given total order, p + q) could be solved together with a 
single minimisation. In lieu of such a procedure one must simply 


128 


compare the minima for the various choicss. Sobs judgment can ba 
uaad hers; for example, If a nodal of the fora AR(1,2) yields the 
grand nlnlnua for p + q ■ 3, it is unlikely that AR(4,0) , or even 
AR(3,1) , will give the grand minimum for p + <7 * 4. 

All of these natters will be illustrated in the examples in 

I VI. 

f) Computation of Subsidiary Quantities (Step 6) 

The point of this section is that the model parameters esti- 
mated in steps (1) to (5) are not necessarily the most Interesting 
numbers in the physical interpretation of the data. For example, 
as already aentloned, the AR paraaeters are often the most easily 
and directly calculated, but the HA pulse shape is the quantity 
for which there is a physical theory. (For example, if quasar 
light fluctuations are due to supernovas, the pulse shape should 
resemble the supernova light curve.) Hence one of the transforma- 
tions that is useful is A C. The direct way to carry this out is 
to compute 

c • *■'[* y (i?2) 

using the discrete Fourier transform, as discussed at length in 
fXXIf and explicitly shown in the appendix. But there is another 
way of evaluating the MA parameters, namely with the relation 

c - x*fi - (with - <*> - 0) (173) 

where • indicates the time reverse of *. Indeed, this is the form 
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used in the constructive proof of the (Wold) existence theorem for 
the MA pulse [see eq. (78)]. It can be thought of as the "super- 
posed epoch" method (e.g., Gosling, Hund hausen, Plzzo, and Ashbridga 
1972) because the convolution in equation (173), re- rltten as 

- £ Wk • < 17 *> 

represents the operation of shifting each pulse to bring its origin 
to a common point in time and then averaging with a weight propor- 
tional to the pulse aaplitude. All of the other, overlapping pulses 
are added in, too; their contribution averages to aero because they 
are uncorrelated with each other, but the pulse which has been 
shifted to the common origin always adds in phase. The cancellation 
of the random overlapping pulses requires that the mean of X be 
sero, which explains the need for Of) end (R> to be zero in equa- 
tion (173). This relation can be proved by noting that if X ■ R*C, 
then 

X*& - C*(R*R) . (175) 

But the expectation value of R*R is a delta function, so that the 
expected value of the right-hand side of equation (175) Is just C. 

Of course the estimate of R*R for any realization is not exactly 
a delta function, but will contain zero-mean noise for nonzero lags. 
(One can use the syasmtry of R*fl to aid in distinguishing this 
nolss from the tails of the pulse.) The estimate in equation (173) 
has several advantages over the simpler form in equation (172): 
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d" 1 Is « smoothed estimate, especially U it has a small number of 
parameters, and to aoaa extent it cmicaala tha uneartaintias In tha 
pulsa shapa. Bacauaa aquation (173) invokas tha data directly, tha 
raaulting pulaa ia lass smoothed than A ~ 1 and thus provldaa a battar 
faaling for tha varianca of tha valuaa of tha element* C n . Anothar 
ahortconlng of tha diract invar aion is that it ia nonlinaar in A 
and thus is a biaaad estimate. For example, if X vara whit a noise, 
tha axpactad valua of A is a dalta function (at laast for aoaa ways 
of deteralnlng it; cf. IVId). But d" 1 contains quadratic and othar 
avan povars of tha d^ which do not hava aaro axpactatlon valua. 
banco <d“*> is not a dalta function as it should ba. In practlca 
this bias la not Important for noat problaoa. 

Anothar lntaraatlng quantity is tha as t lost a of tha innovation. 

ft - X*X , (176) 

which is coaputad ovary tins DOHA)) is. Of coursa A la a saapla 
aatiaata and rafars to tha pulaa aaplitudas in tha particular 
realisation of tha data at hand. It is tha bast (optiaun) aatiaata 
of tha aaplitudas with which tha pulses. C % occurred to produce tha 
observed realisation. Kota that since ft ■ A*X, if C ■ d” 1 it fol- 
lows that X ■ ft*£ exactly. That is. the nodal has sufficiently 
aany degrees of freedom to reproduce the saapled data exactly. 

Thera is thus never any question of how wall tha data is fit. Tha 
questions era: How randoa (independent) is tha a st lasted pulsa 
amplitude sequence? Bow physically reasonable is tha eat lasted 



pulse shape? The amplitudes may be less interesting than their 
distribution, so it is often useful to construct a histogram which 
is an estimate of the amplitude distribution. 

One -an also readily compute the autocorrelation function and 
power speerrrm of X , directly from A [cf. eqs. (65) and (69)]. 

g) Gaps and Uneven Sampling 

Any technique based on prediction-error filters can be readily 
adapted to data which does not have the simple sampling assumed in 
SVa, for there are ways of generalizing the concept of the output 
of such filters (with the input data unevenly sampled) . 

Consider first even sampling with one or more gaps. The case 
of one gap is easily generalized to an arbitrary number. We 
describe one gap in terms of two index sets for the independent 
variable : 

^X n ;nSS 1 ,nSS 2 ^ . (177) 

For example, a gap of length m could be represented with 

■ (1, 2, . . . , and S 2 s (N^ + w + l, JYj+w+2, . . ., N 2 ). 
There are two subcases as given in the following two paragraphs. 

No coherence across the gap.- There are situations where the 
length of the gap is unknown (so that the second segment cannot be 
phased relative to the first), the gap is not an integer number of 
the sampling intervals, or where it is believed (or assumed) that 
the process is not coherent across the gap. For example, in a pure 
MA process there is no coherence across a gap wider than the total 
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extent of the pulse. Even If the pulse Is Infinite, the coherence 
will diminish rapidly as the gap exceeds, say, twice the FWHM of the 
pulse. The case of no coherence is the easiest to handle. One 
simply redefines the function D as a sum over the index sets taken 
separately. That is, if D % (A) is the norm evaluated on the data for 
index set i, treated as if it were the only data available, then 
define 

D(A) - £ D V (A) , (178) 

i 

where the sum is over all the relevant index sets. The minimiza- 
tion of this total D is exactly as before. 

Coherence across the gap.- It is rare that information is 
coherent across anything but a small gap, the most notable excep- 
tion being signals consisting of phase coherent sinusoids or other 
deterministic functions. If it is desired to retain such informa- 
tion, the technique just outlined cannot be used, as the filters 
are never applied to data on both sides of the gap simultaneously. 
The basis of a method for such cases has been suggested indepen- 
dently by several workers: Use (one-sided) prediction error filters 

to fill in the gap(s), and then optimize a new filter on this 
interpolated data. There are various choices as to how to merge 
the predictions (one from the right and one from the left) at the 
center of the gap. The final filter will not contain any informa- 
tion not already contained in the optimizations on the Individual 
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data segments, unless It is longer than the gaps. An example of 
this technique is given by Ulrych and Clayton (1976) . 

Arbitrary scantling.- Consider the case where there are not 
just a few gaps in otherwise even sampling, but where the time 
points are arbitrary, (see §1 and §11). Discrete AR represen- 

tations are applicable only to the special case where the sampling 
times are evenly spaced, because the optimization requires sliding 
the filter along the data (see Fig. 18). But the simple generaliza- 
tion to continuous filters allows arbitrary sampling. The predic- 
tion error, given in the discrete case by equation (114), is 

R n - X n + fx(s)A(t n - s)ds (179) 

and the Integral is replaced by a sum, yielding 

R„-x„+ E Ut k )A{ t„ - t k ut k . (180) 

Since A(t) is continuous, it does not matter that the intervals 
t n - are not all the same. To parameterize the function A so 
that the optimization can be carried out with respect to a set of 
discrete parameters rather than a continuous function, introduce 
the expansion 

A(t) - £ A k*k W ’ <181) 

/c 

where the 4>^(£) are a set of continuous functions which must be 
specified. The problem has been reduced to the same form as before — 
the innovation defined [by eqs. (180) and (181)] in terms of a 
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discrete set of parameters, {4^} . The optimization can be carried 
out as before, and the pulse shape and amplitude sequence, auto- 
correlation function, or power spectrum can be evaluated much as 
before. The author has carried out limited experiments with the 
choice A t =“ 1/2 (i., - t ,) and the expansion [eq. (181)] given 
as either a Fourier series or a power series. While encouraging, 
the results will not be described here as there was moderate depen- 
dence on the choice of the functions ^(t), the number of terms 
kept in equation (181), the length of the operative time interval 
[i.e., the number of values over which the 7c- sum in equation (180) 
is evaluated], etc. Good methods of selecting these must be 
developed . 


VI. NUMERICAL EXPERIMENTS 

The best way to evaluate a deconvolution procedure is to try 
it out on artificially generated data of known characteristics. 

All of the test problems described here are low-order autoregressive 
processes, with specific choices for A. The time series were 
actually generated by filtering an innovation R with the inverse 
C m A~ 1 (thus representing the process as a high-order moving aver- 
age) . The innovations are of the form R * if 1 ; U is a sequence of 
independent random numbers, distributed uniformly on (0,1), and 
if 1 means simply that U is raised to the nth power, term by term. 

As seen in Figure 5, a large value of n gives a few large amplitudes 
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and approximates the shot noise process. The other limit, small n, 
corresponds to much pulse overlap (i.e., many large amplitudes 
instead of a few) and takes on the appearance of a normal process. 
The higher the value of n, the less pulse overlap there is and the 
easier deconvolution should be. In the extreme case of normally 
distributed R the overlap is so great (to the point that X » R*C 
is also normally distributed) that no method can recover phase 
information, and the deconvolution problem as meant here (i.e., 
with correct phase) is intrinsically unsolvable. Any technique 
should give progressively worse results as n is decreased and should 
be completely unable to recognize phase properties as R approaches 
normalcy. These expectations are borne out by the experiments 
about to be described. White noise with several variances is added 
to some of the test data sets, so that the time series is of the 
form 

X * lP*A~ l + o n 2 N , (182) 

where N is Gaussian noise of unit variance. 

a) Experiment J — Comparison of dependence Measures 

The dependence measures introduced in §IVc were tested on the 
process defined in equation (182), with A - (-0. 2,1, -0.3) . The 
corresponding Inverse pulse C m A~* is a two-sided exponential 
which rises somewhat more rapidly than it decays. Table 2 presents 
results for a sequence of innovations ranging from n » 40 (highly 
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nonnormal, pulses essentially isolated, easy for almost any tech- 
nique) to n ■ 1 (nearly normal, much pulse overlap, difficult for 
any technique). No noise was added. Note that optimization 
(with i4g > 1) is exact^ for large n but degenerates quickly as the 

## 

This is a general property of Lj, and is related to the fact 
that the 2^-optimum solution of an over-determined set of linear 
algebraic equations always solves a subset of the equations exactly, 
as was realized by Laplace (see Claerbout and Muir 1973) . 


pulse overlap increases. The iterative L j procedure (§IVe) degen- 
erates much more slowly as n decreases and would have made an 
impressive entry in Table 2. However, difficulties with convergence 
on more difficult problems make this technique, as implemented, 
unacceptable as a general-purpose method. Surprisingly the martin- 
gale difference property method fails badly, even for the easy U 1,0 
problem. This failure is unfortunate in view of the simplicity of 
the technique. Further development of the MDP approach may be 
fruitful. 

The results shown for probability distribution functions (PDF) 
were calculated with five equally spaced and equal bins in tf-space 
[25 in (i? n ,/? n+ffl )-space] , chosen to float with the changing values 
of the minimum of R(A) and maximum of /?(/!), as this was empirically 
found to be better than having fixed bins. For some problems it 
is preferable to choose the i?-bins so that roughly equal numbers of 
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TABLE 2 


TEST RESULTS 


Innovations with Various Distributions: R ■ iP 
Pulse Shape: Two-sided exponential (-0.2, 1, -0.3) -1 

Length of Data: N » 100 (Averages of four such realizations) 


n 

CPF-method* 

PDF-met hou* 

MDP-method* 

L j -opt imizat ion 

40 

-0.200,1,-0.300 

-0.195,1,-0.296 

-0.194,1,-0.419 

-0.200,1,-0.300 

9 

-0.202,1,-0.309 

-0.207,1,-0.294 

-0.219,1,-0.251 

-0.230,1,-0.306 

4 

-0.191,1,-0.305 

-0.169,1,-0.250 

-0.041,1,-0.453 

-0.318,1,-0.328 

1 

-0.201,1,-0.348 

-0.582,1,-0.017 

-0.257,1,-0.148 

-0.509,1,-0.503 


^Maximum lag, m* ■ 1. 


points fall in them. Gaussian weight functions for the bins were 
used to combat the quantization problem outlined in §IVc. The results 
are substantially dependent on the number and placement of the bins, 
and at best the test answers are less accurate than those obtained with 
cumulative probability functions (CPF). In addition, the convergence 
properties of Dp, although better than those of the other dependence 
measures (based on characteristic functions, moments and the MDP), are 

much worse than those of D„. 

F 

Table 3 displays the results of similar tests dealing with the 
effects of additive noise or the computations. With R fixed at £/ 9 , 
various levels of noise were added according to formula (182). In 
both comparisons the cumulative probability function method is 
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TABLE 3 


TEST RESULTS 

Various amounts of Additive Gaussian Noise: o ^ ■ Noise 

Variance (Pulse peak * 1) 

Pulse Shape: Two-sided exponential (-0.2,l,-0.3)“ l 

Length of Data: N ■ 100 (Averages of four such realiza- 


tions) 

Innovation: R ■ U* 


°N 

CPF-method* 

PDF-method* 

L j -opt imizat ion 

0.00 

-0.202,1,-0.309 

-0.207,1,-0.294 

-0.230,1,-0.306 

0.01 

-0.202,1,-0.300 

-0.230,1,-0.282 

-0.239,1,-0.317 

0.05 

-0.184,1,-0.261 

-0.130,1,-0.258 

-0.232,1,-0.339 

0.10 

-0.169,1,-0.200 

+0.003,1,-0.133 

-0.183,1,-0.351 


^Maximum lag, m* ■ 1. 


superior to each of the others. The problem with R ■ i/ 4 and only 
100 data points is very difficult, and compared to any other 
method tested the current one does amazingly well. Tables 2 and 3 
do not represent enough trials to be definitive, but they indicate 
trends confirmed by other computations which are not presented here. 

b) Experiment 2 — Detailed Study of an AR(l t l) Process 

This experiment is an intensive study of a process similar to 
that in Experiment 1. The aim is to study in detail a relatively 
difficult problem, namely deconvolution of the AR(1,1) process 
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(183) 


X - U^A’ 1 + 0.05/Y , 
where A ■ (-0.2, 1,-0. 3) is the same as in Experiment 1. This choice 
combines a moderately high noise level (cf. Table 3) and a low 
value of n (cf. Table 2), and presents a rather difficult problem. 
The solid line in Figure 20 is a realization of this process with 
N - 100. 

Table 4 is a summary of the results of minimizing D p with five 
different values of m*. In all cases the starting solution was 
(0,1,0), and convergence to the AR(1,1) solution shown in the Table 
was rapid — in no case did restarts lead to significant changes in 
either of the parameters. The procedure was then to optimize both 
AR(1,2) and AR(2,1) filters, using as starting values the AR(1,1) 
solution with a zero appended. What is shown in the next line of 
the Table is the third-order ( M * 3) solution which had the smaller 
value of the minimum Dp of these two cases. This process is then 
repeated. At each step, the filter may grow to the left or to the 
right, according to which produces the smaller Dp. Let us examine 
the convergence in this process, starting with m* ■ 1. The quantity 
tabulated in the second column is 

D * V°ini«u®) (f-r|) ^ • <184) 

by analogy with equation (150) , thus including the penalty for the 
number of parameters in the model. It is hoped that this quantity 
might have the property that makes the FPE useful: As a function 

of M (the number of free parameters), a minimum of D indicates that 
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TABLE 4 


DECONVOLUTION OP (-0.2,1, -O.S)" 1 *!^ 3 + O.OSff 


(1) 

(2) 

(3) 

A 

HA) 

M 

m* - 1 

0,1,0 

2.1078 


-0.164,1,-0.304 

0.1555 

2 

-0.207,-0.112,1,-0.312 

0.1117 

3 

-0.076,-0.137,-0.105,1,-0.302 

0.0810 

4 

m* ■ 2 

0,1,0 

1.2218 


-0.1664.-0.308 

0.1444 

2 

+0.033,-0.183,1,-0.276 

0.1437 

3 

+0.003,-0.1774,-0.283,-0.010 

0.1328 

4 

«* - 3 

0,1,0 

0.9270 


-0.148,1,-0.331 

0.1557 

2 

+0.048,-0.188,1,-0.281 

0.1502 

3 

+0.013,-0.177,1,-0.282,-0.004 

0.1495 

4 

-0.010,+Q. 020,-0.145,1,-0. 292, +0.008 

0.1339 

5 

m* ■ 4 

0,1,0 

0.8238 


-0.205,1,-0.316 

0.1945 

2 

+0.070,-0.245,1,-0.283 

0.1868 

3 

+0.014,-0.197,1,-0.302,-0.004 

0.1967 

4 

-0.010, +0.013, -0.153, 1,-0. 295, +0.024 

0.1842 

5 
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TABLE 4 
Concluded 




(2) (3) 

D(A) M 


f 

m* ■ 5 


; 


0,1,0 

0.7934 

i 


-0.235,1,-0.318 

0.2126 

2 

1 

+0.071,-0.264,1,-0.275 

0.2069 

3 j 

r 

+0.066, +0.076,-0.266,1,-0.256 

0.2184 

4 

h 

-0. 002 ,+0. 050, -0.222 ,1,-0. 261 , -0.005 

0.2228 

5 W 


the correct order M has been reached. But for m* - 1 this quantity 
keeps on decreasing with Af, giving no indication of reaching a 
minimum. Also the values of the new parameters are not small, so 
there is no indication of convergence at all. This situation is 
greatly improved for m* ■ 2, as the new parameters (+0.033 and 
-0.010) are relatively small. In addition, while D does not reach 
a minimum, it decreases quite slowly with M. One might guess that 
the correct order is AR(1,1) (i.e., M ■ 2) from the entries in 
table 4 for m* * 2. The improvement continues for m* ■ 3. Starting 
at m* ■ 4 there is a minimum in D, at M - 3 (the correct order is 
M ■ 2), and the value of the extra parameter 4_ 2 (which should 
be -0) is small, 0.07 in both cases. Starting with m* ■ 4, and 
especially at w* ■ 5, the values of the parameters change signifi- 
cantly from the values they had for lower m*. It appears from this 
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experiment that if m* is too low (1 or perhaps 2 in this example) 
or too large (5 or perhaps 4) the results are not as good as they 
are for an interned late value. This result is as expected: If m* 

is smell, sons of the infornation to be gained by reducing the 
dependence in A at larger lags is lost. If m* is too large, the 
infornation trill be diluted as the minimization trill try to reduce 
dependencies at large lags where there are none to reduce. This 
suggests taking m* ■ 3 l 1 in the present experiment. Figures 20 
to 22 show results for the M ■ 3, m* ■ 3 solution (which is very 
slailar to H ■ 4, m* ■ 3 and to M ■ 2 or 3, w* ■ 2). The dashed 
line in Figure 20 is the estimated innovation. Figure 21 conpares 
this with the exact innovation from which the realization of X was 
constructed. This estimate and tin corresponding pulse (compared 
with the exact one in Fig. 22) are very accurate. Figures 23 and 24 
present slailar comparisons for the somewhat different solution 
corresponding to M » 3, m* • 5 (which is similar to M ■ 3, m* ■ 4), 
which might have been selected from Table 4 if the quasl-FPE cri- 
terion were taken seriously. This solution yields slightly poorer 
reproductions of the innovation and the pulse shape (although the 
latter is difficult to see in coopering Figs. 22 and 24). 

o) Experiment 5 — An AR(2 t l) Proaeee 

The goal of this test is to see what happens if the process is 
aore complicated. In particular, we will see to what extent the 
quasl-FPE criterion (s minimum in the function r (M) given in 
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FIG. 22. -Comparison of the estimated (solid line) and exact 
(dashed line) pulse shapes for the process shown in Fig. 20. The 
solution shown is A * (0.048,-0.188,1,-0.281) (obtained with 
m* ■ 3). 
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FIG. 23. -Comparison of the exact (dashed line) and estimated 
(solid line) innovation for the process shown in Figure 20, but 
corresponding to a different solution, namely A ■ (0.071,-0.264,1,-0.275) 
obtained with m* ■ 5. This result illustrates that the value of m* 
can be too large. 
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FIG. 24. -Comparison of the exact pulse (dashed line) with the 
pulse derived from the solution mentioned in the caption to 
Figure 23. 
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eq. (184)] is useful in determining the order of a higher-order 
process. The process chosen is again given by the basic form in 
equation (182), with A ■ (-0. 3, 1,-0. 2,-0. 3) , n ■ 9, and o^ ■ 0. 

This should be an easy problem because the innovation ( U 9 ) is so 
highly nonnormal and because there is no noise. This was done 
purposely, to minimize the confusion due to noise and excessive 
pulse overlap, thus isolating the order-determining problem. The 
realization studied here is plotted in Figure 25. Table 5 summar- 
izes the minimization, in the same format as in Table 4. Because 
the AR filter generating the process is longer, a larger range of 
values of m * has been included. As before, the starting solution 
was the simple AR(1,1) with A ■ (0,1,0). The results for m* ■ 1 
are very poor, as might be expected, as A ties together values 
separated by up to three lags, so a lag of one appears to be inade- 
quate. As expected, the results are much improved for m* ■ 2 and 3. 
For m* ■ 3 and 4, the result is essentially perfect, in that the 
quantity D goes through a minimum at the correct order, the param- 
eter values are almost the same for the two values of m*, and the 
values of the higher-order parameters (Af ■ 4, 5, . . .) are very 
small. For m* ■ 10 the minimum in D occurs at M m 4, too large by 
1, but again the extra parameter is very small, so that this solu- 
tion is essentially identical (e.g., in terms of the corresponding 
pulse shape) to the solutions for lower values of m* which are of 
the correct order. Figure 26 shows the innovation and Figure 27 
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FIG. 25. -Realization of the AR(2,1) process described in the 
test (U*, no noise). The dashed line is the estimated innovation 
corresponding to the solution 4 - (-0.27,1,-0.202,-0.323) obtained 
for both to* - 2 and to* - 3. 
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TABLE 5 

DECONVOLUTION OF (-0.3,l,-0.2,-0.3)' l *£/ 9 



A 


M 

m* ■ 1 

0,1,0 

4.232 


-0.256,1,-0.336 

0.1331 

2 

-0.274,1,-0.107,-0.577 

0.0906 

3 

-0. 288 ,l,+0. 017 ,-Q .874.+0. 106 

0.0747 

4 

-0. 368,-0. 207 ,l,+0. 119, -0.761.+0. 256 

0.0327 

5 

-0.347,-0.220, 1,-0.017, -0.563, +0.362,-0. 164 

0.0319 

6 

m* » 2 

0,1,0 

4.026 


-0.132,1,-0.462 

0.5114 

2 

-0. 269,1,-0. 202,-0. 323 

0.1002 

3 

-0. 262,1,-0. 194,-0.316,-0. 060 

0.0967 

4 

-0.260, 1,-0.120, -0.409,-0. 128,+0. 174 

0.0924 

5 

-0. 256,1, -0. 138,-0. 274 ,-0. 197 ,-0. 101 ,+0. 318 

0.0698 

6 

m* • 3 

0,1,0 

3.547 


-0.170,1,-0.441 

0.3683 

2 

-0.268,1,-0.202,-0.323 

0.0884 

3 

-0.273,1,-0.201,-0.321,-0.0001 

0.0921 

4 

-0.256,1, -0.233, -0.324, +0.043, -0.024 

0.0938 

5 

-0. 307, -0.226,1,-0. 180,-0. 261.+0. 204, -0.359 

0.0686 

6 


m* » 4 


0,1,0 

3.168 


-0.129,1,-0.482 

0.3565 

2 

-0.272,1,-0.201,-0.322 

0.0749 

3 

-0. 276 4L, -0.199, -0.318, -0.002 

0.0780 

4 

-0.279, 1,-0. 210, -0.318, +0.019, -0.003 

0.0799 

5 

-0. 237 ,1,-0. 242,-0. 376, +0.127.+0. 067, -0.192 

0.0698 

6 


S 
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TABLE 5 
Concluded 



0,1,0 

1.8225 


-0.126,1,-0.497 

0.2519 

2 

-0.274,1,-0.201,-0.323 

0.0919 

3 

-0. 271,1,-0. 202,-0. 322,-0.003 

0.0869 

4 

-0.274,1,-0. 219,-0. 316.+0. 016, -0.008 

0.0924 

5 

-0.282,1, -0.224, -0.321, -0.040, +0.002, -0.029 

0.0866 

6 








the pulse shape estimates from the M ■ 3 solution for m* ■ 2 or 
m* ■ 3 (the A’s are essentially identical , and, for example, the 
pulse shapes would be indistinguishable In Fig. 27). In each caae 
the estimate is compared with the exact quantity. Both the pulse 
shape and the Innovation are reproduced very accurately. Note that 
there is a large amplitude pulse which occurred very near the 
beginning of the realization. The pulse actually occurred prior to 
the first point of the estimated innovation, but it is shown in 
Figure 26 to stress the point that pulses very nesr the end and 
beginning of the realization are not represented accurately because 
of end effects. Nevertheless the part of any such pulse that 
extends into the realization is Included in the determination of 
the model parameters. 

Table 6 and Figures 28, 29, and 30 (for M ■ 4) present the 
deconvolution of the same realization just discussed, but with 

TABLE 6 

DECONVOLUTION OF (-0.3,1,>0.2,-0.3) -1 *^ + 0.050 





FIG. 28. -The came realization shown in Figure 25, but with 
added noise of variance G.05. The dashed line is the innovation 
derived from the solution A ■ (-0.310,1,-0.075,-0.333,-0.088) 
(obtained for m* ■ 3). 
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FIG. 2 9. -Comparison of the estimated ( sol Id line) and exact 
(dashed line) innovations for the process shown in the previous 
figure. 
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«dd«d Gaussian noise of variance 0.05. These results show that ths 
accuracy of the paraaeters determined above for this third order 
process is not due to the absence of noise. The Innovation shows 
Increased variance, including the appearance of small negative 
amplitudes which arc not present in the actual Innovation. It 
appears that the effect of additive noise is to add noise to the 
estimated innovation, but it cannot be determined whether the dis- 
tribution of the noise in the innovation is also Gaussian. Fig- 
ure 30 shows that the basic shape, including the secondary peak, of 
the pulse is retained but the tail of the pulse is altered somewhat. 

d) Experiment 4 — Gauettian Noise 

One can consider Independently distributed noise as the convo- 
lution of an Independently distributed Innovation with a delta func- 
tion. When applied to noise, the deconvolution procedure should 
produce a delta function pulse. This experiment wes designed to 
teat the procedure on Independent Gaussian noise. The solid line 
in Figure 31 is the noise analyzed. The minimization was done for 
the single value m* ■ 2. The quasl-FPE did not dearly Indicate 
convergence, but this hardly matters because all of the solutions 
were dose to delta functions. The dashed line in Figure 31 la the 
estimated innovation (plotted with a different scale), and as desired 
is very nearly the same as the data Itself. The pulse shape shown 
in Figure 32 is the inverse of the best third-order solution 
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FIG. 32. -The pulse shape derived for the data shown in the 
previous figure, corresponding to the A given in the text. The ideal 
solution would be a delta function. The horizontal scale of this 
figure is about three times that in Figure 31. 
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A * (-0.061,4-0. 072, 1,4-0. 134) and Is not far from the desired delta 
function (|C n | is <0.1 for all n j* 0). 

e) Experiment 5 — A Sine Wave 

The technique we have been discussing was designed for random 
processes, and it could easily break down in the presence of a 
deterministic part to the data. This experiment tests this possi- 
bility, using a sine wave as an example of a deterministic process. 
If a sine wave is considered as a HA pulse (which would be unstable , 
as the coefficients do not converge), the corresponding AR filter 
has a zero on the unit circle. [Compare to the case A ■ (1,1), with 
C ■ 4" 1 ■ (1, -1,1, -1,1, -1,1, . . .).] When applied to a pure sine 
wave the simplex minimizer had convergence difficulties, and 
dropped by a factor of 10 30 during the minimization. The pulse 
shapes that it was leaning toward, however, were more or less 
sinusoidal. Since the pure sine wave is a singular case, a small 
amount of noise was added, so that the data were given by 


X n - sin(0.5n) 4- 0.00250 , (185) 

where as before N is unit variance Gaussian noise. This addition 
removed the convergence problems, and the solution 
A m (-0.419,-0.070,1,-0.813) was obtained with m* ■ 3. Figure 33 
shows the data as a solid line. In this case the interpretation of 
the innovation (dashed line in Fig. 33) is not straightforward. A 
sine wave is a single pulse, not a random sequence of pulses. But 
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this model appears to represent a sine wave as a random sequence of 
the pulses shown in Figure 34 (i.e., the inverse of the above solu- 
tion for 4), basically a damped sine wave. Remen&er that because 
of the way the innovation is calculated, the data are exactly repro- 
duced (except near the ends) by the expression P*A~^ , so the inno- 
vation in Figure 33, convolved with the pulse in Figure 34 (not all 
of which is shown), reproduces the data. 

f) Experiment 6 — 3C 273 

Data on the optical variation of the Quasar 3C 273 (Kunkel 
1967) have been analyzed by a number of workers looking for periodic- 
ities and for pulses (the closest in philosophy to the present work 
are Fahlman and Ulrych 1975, 1976). A future paper will give the 
details of the analysis of these data using the CPF-method, but 
preliminary results will be given here to demonstrate the applica- 
tion of the technique to real data. In particular the issue of 
determining the amount of a possible constant component to the light 
curve is raised. The point is that there are two contributions to 
tb mean value of the data: (1) a background constant, due for 
example to light from a source other than the one which is pulsed; 
and (2) the mean value of the (positive only) pulsed component. If 
the pulses are sparse enough, there will be a part of the time 
series where the contribution from pulses can be neglected, and 

then the minimum value of the curve, minOL), would be a good esti- 

n 

mate of the background constant. But in general there can be 
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enough pulse overlap at all times that thl3 procedure will over- 
estimate the constant. Indeed the deconvolution Is nontrivial only 
when there is much pulse overlap. In such cases it is known only 

that the constant lies between 0 and min(lT n ). We will see below 

n 

that this problem in some circumstances can be solved with the cur- 
rent technique. 

Figure 35 depicts the light curve in linear intensity units, 
while Table 6 tabulates the results of the minimizations. This is 
a relatively long time series (N m 292 in the original data; the 
first four points were discarded so that N ■ 288 because the FFT 
algorithm requires that the largest prime factor of N be £23). 

Since the number of operations scales as N 2 , the reductions are 
moderately time consuming. For example, the run with N ■ 288, 
m* ■ 3, and M ■ 2, 3,4,5, 6 took 1,140 CPU seconds on the NASA-Ames 
CDC 7600. It will be noted that D does not go through a minimum, 
although for m* ■ 3 it is virtually stationary for M » 4 and 5. 

Also, the parameters 4_ 3 and especially A +2 are dmall. This sug- 
gests that the M » 4 solution is to be adopted, but further compu- 
tations with larger m* will be necessary before this can be made 
definite. The pulse shape is shown in Figure 36 and is compared 
with the minimum delay pulse determined by Fahlman and Ulrych (1975) 
with M ■ 3 (as determined by a legitimate FPE criterion) . The 
innovation for this solution is shown in the lower part of Figure 35 
and is compared to the innovation from the minimum delay solution 
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FIG. 35. -The historical light curve of 3C 273 (top), derived 
directly from the magnitudes given by Kunkel (1967). The intensity 
is on a linear scale in arbitrary units, and the time covered is 
28,800 days. The estimated innovation shown is for 
A - (-0.081,0.265,-0.740,1,-0.419) obtained for m * - 3. 
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TABLE 7 


DECONVOLUTION OF THE LIGHT CURVE OF 

3C 273 


N - 288 

A 


M 

m* - 1 

0,1,0 

1.832 


-0.535,1,-0.519 

0.001 677 

2 

+8.181,-0.696,1,-0.450 

0.000 6955 

3 


m* - 2 


0,1,0 

1.321 



-0.516,1,-0.500 

0.000 

2493 

2 

-0. 503,1,-0. 540.+0. 028 

0.000 

2122 

3 

-0.495,1, -0.569.+0. 078, -0.028 

0.000 

2108 

4 

m* ■ 3 

0,1,0 

0.809 

5 


-0.523,1,-0.528 

0.000 

6587 

2 

+0.146,-0.655,1,-0.462 

0.000 

3273 

3 

-0.081, +0.265, -0.740, 1,-0. 419 

0.000 

2883 

4 

-0.082, +0.264, -0.741,1, -0.421, +0.002 

0.000 

2879 

5 

-0.029, +0.212,-0. 713,1, -0.469.+0. 081, -0.063 

0.000 

2683 

6 
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in Figure 37. Both innovations have substantial numbers of nega- 
tive amplitudes. 

The author has carried out numerical experiments similar to 
those discussed by Fahlman and Ulrych (1976)* confirming their 
contention that such behavior can have two causes: (1) noncon- 

stancy of the pulse shape, or (2) use of a minimum delay solution, 
if the actual pulse is not minimum delay. The point in (1) is 
that '.he pulse shape may actually be changing, say in a random but 
stationary way, rather than being constant. The MA representation 
is still exactly correct, as long as X is stationary, but it uses 
a single pulse shape. This shape is a kind of time average of the 
actual pulse shape. (It is not simply representable as a time 
average, however; the deconvolution procedure yields some kind of 
nonlinear average of A , then C is the corresponding inverse.) When 
a pulse with a shape close to this average is convolved with the 
optimum A, a delta function results, as desired. But if the shape 
is somewhat different from the average, this convolution produces 
something other than a delta function. Simulations consisting of 
two or three distinct pulse shapes occurring randomly and indepen- 
dently show that the resulting amplitude usually consists of a 
f irst-negative-then-positive (or vice versa) spike, liUe the dis- 
crete version of the derivative of a delta function. Such spikes 
can be seen in the innovations in Figure 37. The form of the spike 
appears to be sensitive to the delay character of A, as the simul- 
taneous spikes in the two Innovations are sometimes quite different. 
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FIG. 37 . -Comparison of the innovation* derived from the minimum 
delay (top) and mixed delay (bottom) solution* as in Figure 36. 

Note that the negative spike* are typically associated with nearby 
positive spikes; however, the pattern of this association seems to 
be different in the two Innovations. 




Effect (2) it quite simi’.ir, because the optimum minimum delay 
A ia not the correct inverse of a mixed-delay pulse; and its con- 
volution with the actual mixed-delay pulse will also produce other 
than a delta function. From the fact that the mixed-delay result 
shown here contains roughly the same amount of negative amplitude 
as the minimum delay result (Fig. 37), it appears that in 3C 273 
the pulse shapes are indeed varying, and the negative amplitudes 
found by Fahlman and Ul^r.. u»76) are not due to the minimum delay 
assumption. (It is possible, but unlikely, that ihere is an addi- 
tional source of negative amplitudes.) 

There is one facet of the distribution function approach 
(either cumulative or differential) which is very useful, namely 
that it is completely insensitive to an additive constant in the 
data. The only factor that enters into the expressions for Dp or 
Dp is the shape of the lolnt and individual distribution functions. 
Adding a constant merely shifts the position of the functions on 
the /?- axis and does not change their shapes. Hence D is invariant 
to a shift in X, a property not shared by other deconvolution 
techniques. This invariance is important because it means that it 
is possible to estimate the size of the background component . . . 
if something is known about the distribution of the amplitudes. 
First, note that a constant In the data shows up as a constant in 
the estimated innovation — if one has the correct Inverse pulse. 
For, letting A be the constant unit process: 
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K* - 1 (n - 1, 2, 3, . . ., N) , (186) 

m can writ* 

X - R*C + aX , (187) 

where a ia mm unknown constant. The eat luted innovation is 


R - A*X • fl*(C*-4) + - fl*(CM) + 

and if A - C 1 

- R + ^ A^jx , Q.E.D. 


( a 5 '*) 


X , (188) 


(189) 


The second tern on the right is a constant, but it is not yet 
obvious how to determine its value (and hence the value of a) , 
because we know only </?>, and not 0?>. If it was known, or one 
wished to assume, that (R) m 0,. then 


<R) 


Za, * 

k K 


090) 


But the case (R) J 0 Is of particular Importance in astroneny. 

For exanple, suppose that the actual amplitudes are positive only 
(as with light pulses), with a distribution which is either finite 
at R ■ 0 or goes smoothly to zero (so that sou pulses have ampli- 
tudes close to zero, but none are negative). Then 


(£ A *) ■ ■s n(J> » ) 


(191) 


could be used to estlute a. However, observational errors produce 
a variance in R which would uke this estlute biased toward too 
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small a. This bias could be eliminated if the center of symmetry 
of the (presumably Gaussian) distribution of these observationally 
induced errors in R could be recognized. But an even larger prob- 
lem with the estimate in equation (191) for the innovation of 
3C 273 is the incidence of the large negative amplitude spikes. 

One must turn to more qualitative aspects of the distribution of 
R n . Specifically, the innovations in Figure 37 appear to have a 
definite background level (possibly better seen in the mixed-delay 
solution case), indicated in the figure with horizontal lines. This 

A 

level corresponds to the peak in the distribution of R (which is 
Fig. 38) , and is probably best estimated with the median of R (to 
avoid the bias in the mean value which the real pulses might pro- 
duce) . In the case of 3C 273 the mean and median are not very 
different, as the entire distribution is nearly symmetric (there is 
possibly a slightly significant bias on the positive side of the 
distribution shown in Fig. 38). In summary, the mean level for 
3C 273 cannot yet be determined unambiguously because of the effect 
of the negative amplitudes in the innovation, but the levels shown 
iii Figure 37 are reasonable guesses for this background of non- 
pulsed light. 

In some other deconvolution methods the mean value of X is 
removed, and this is an example of a shift which may alter the 
deduced pulse shape. In oaiticular, the optimum-prediction-error- 
filter method is usually applied to data that lias had the mean 
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FIG. 38. -Histogram showing the distribution of the .lse 
amplitudes shown in Figure 37 (bottom). A Gaussian curve fitting 
the central few bins is drawn for comparison. The overall dis- 
tribute a is definitely not purely Gaussian. It may have a 
Gaussian component, possibly connected with the observational 
scatter in the data. There may be a small asymmetry favoring the 
positive amplitudes, but the negative amplitudes (which are prob- 
ably due to pulse s'- >pe variation) are nearly as numerous — this 
prevents the zero 1 >'l G f the amplitudes t rom being determined 

unambiguously. The tesults for the inno'ution derived from the 
minimum delay solution (top curve in Fig. 37) are very similar. 


L 


MUM 
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subtracted out, because the form of equation (114) implies that, 

since the mean prediction error should vanish, either < X > or 22 

i 

must vanish. If the sum of the 4. vanishes, 4(a) has a zero on 
the unit circle, and A itself is not invertible because 4“ 1 does 
not converge. Indeed, it is found in numerical trials that if the 
mean of X is left in, the resulting 4.'s sum to zero and A cannot 
be inverted. But if (X) is 0, 4 is well behaved. This is prob- 
ably the basis on which Fahlman and Ulrych (1976) state that their 
analysis "... only makes use of the variance in the light curvt . 
Hence the pulse shape ... is unaffected by the presence of a 
background." However, one is not justified in subtracting out the 
mean just because the analysis breaks down otherwise. 

g) Discussion 

The minimization of D„ appears to be a powerful deconvolution 

C 

technique for moving average, autoregressive, or shot noise pro- 
cesses where the pulses .ire statistically independent of each 
other. An estimate of the pulse shape which is not constrained to 
have the minimum delay shape can be obtained, as well as an esti- 
mate of the amplitudes which the pulses had in the realization at 
hand. ith the latter, the distribution of the pulse amplitudes 
can be studied. If a feature in the distribution corresponding to 
the zero level of the amplitudes can be recognized, the background 
level of nonpulsed signal can be determined. 
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It is well-known that tb» fitting of sums of exponentials with 
unknown decay constants, as well as amplitudes, to data (e.g., 
radioactive decay data) is a very ill-conditioned problem. Since 
the exponential is in a sense the elementary pulse shape [see 
eqs. (21) through (23)] the deconvolution of MA* s is not unrelated 
to this problem. One of the difficulties is that the data can be 
nearly equally well represented by somewhat different models (dif- 
ferent in form and in the values of the model parameters) . The 
search for the best dependence measure (see SIVc and §VIa) was 
basically a quest for a procedure which minimizes the indeterminacy 
in the model fitting. In this respect, the one adopted (D^) , is 
generally superior to the others considered. It makes full use of 
the data at hand and has a well-defined and unique minimum in 
situations where the other measures have many shallow minima. The 
following points should be considered by anyone using this technique 

(1) As with conventional time-domain modeling, the identifica- 
tion of the form of the model (even within the context of ARIMA 
models) is an important problem which does not have a precise 
general solution. 

(2) Since any stationary process has MA, AR, and ARMA repre- 
sentations, the successful modeling of time-series data with a 
specific model does not guarantee that the structure of the physi- 
cal process has been correctly represented. 

(3) Since the data are always exactly reproduced by the 
model, the meaning of successful modeling is not based on the 
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smallness of the residuals between the sampled and modeled values 
of X , but rather on the degree to which the resulting amplitudes 
are independently distributed (e. g. , as measured by the smallness 

of V- 

(4) As with conventional modeling, including spectral analysis 
trends in the data can affect the results in very significant ways. 
There is no totally objective and automatic procedure for removing 
trends. There is no dependable way that an apparent trend can be 
distinguished from a statistical fluctuation in the underlying 
random process. Detrending should be done cautiously, and one 
should be suspicious of apparent trends. 

(5) The algorithm provided in the appendix is quite time con- 
suming, especially for long data arrays. Only minor efforts to 
speed up the computations have been made. Improvements in the 
algorithm can undoubtedly be made. Hopefully there is some approxl 
mation that can be used for large N. 
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APPENDIX 


THE ALGORITHM 

The FORTRAN code given below is a nearly self-contained pro- 
gram which will enable the reader to use the deconvolution tech- 
nique (based on cumulative probability functions) . The only miss- 
ing element is the FFT routine, which is a standard one, available 
in m^st program libraries. 

The MAIN program reads the value of m*, the data, the length 
of the AR filter (LAC), the position within the filter of the pre- 
diction point (MPT), the initial guessed solution (AOLD), and the 
number of times the order of the model is to be increased (NUMIT) . 
The Fourier transform of the data is put in the arrays XR and XI, 
for that is the form in which the data will be referenced hence- 
forth. The subroutine F2DC carries out the minimization, starting 
with a given solution, and returns the resulting minimum value of 
Dp (RES). This is done first with the input guessed solution, and 
then the order is increased in steps of one as indicated. The two 
minimum values RES1 {corresponding to A(new) = [A(old),0]} and 
RES2 {corresponding to A(new) « [0,A(old)]} are compared, and the 
smaller is selected. This procedure is terminated arbitrarily by 
the value of NUMIT. The correct order must be determined by 

inspection of the behavior of D (minimum) with increasing order, 

t 
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and inspection of the way in which the values of the parameters 
change, as discussed in the text. 

Subroutine F2DC carries out the minimization, trying 

restarts until the solution settles down. A criterion has been 

shown in terms of the minimum D „ , but one could also use criteria 

F 

in terms of the changes in the parameters. The program is written 
so that if three restarts are not sufficient, "DID NOT SETTLE" is 
written and the program continues. The rest of the program, from 
statement 3 on, is merely to evaluate the pulse shape C inverse to 
the converged A (in practice this should be printed or plotted, so 
that it can be seen how the pulse shape is changing as the proce- 
dure continues to higher orders) . Also calculated is the quasi-FPE 
quantity given in equation (184). This number should also be printed. 

Subroutine F2D sets up some constants that are needed in FUNK, 
computes the initial simplex using formulas given by Jacoby, Kowallk, 
and Pizzo (1972), calls the minimization routine, AMOEBA, and 
prints out the resulting AR filter. The program AMOEBA directly 
implements the simplex procedure as given in the references cited 
in the text. The criterion for convergence is in terms of the 
relative magnitudes of the maximum and minimum functional values on 
the simplex; this could be experimented with, as there are other 
equally valid convergence criteria. 

Function FUNK is the guts of the program, as it provides the 
values, as a function of the AR parameters, of the measure of 
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independence D„ which is to be minimized by AMOEBA. The evaluation 
of the innovation has been discussed in the text (§Vc). The order- 
ing of the innovation is important for an efficient evaluation of 
Dp and is carred out with sorting (SORT) , moving (MOVE) , and 
merging (MERGE) routines, all controlled by the main ordering 
program ORDER. These routines are based on material in the volume 
by Knuth (1973) and are such that the number of operations increases 
as NlogN. The only part of the procedure which produces an TV 2 
dependence is the summation over the two-dimensional grid. 

The structure of the recursion for the summand in equation (170) 
[see eq. (171)] can be understood by reference to Figure 39. This 
figure shows the two-dimensional grid of the reordered values 7? 
with * min(/? n ) and R ^ ■ ma x(7? n ) . A given R\ is paired with the 

R\ which was its wth removed neighbor in the original (unordered) 

J 

set {7?„}: 


R\ 

t 


R\ 

3 


7? 


(Al) 


n+ml 


This pairing is indicated by the dots at the grid points in the 
figure. In the example shown, 7?j is paired with 7?^, 7?^ with 7? 
and so forth. Each R\ is of course paired with no more than one 
R\. For m ■ 1, the R\ equivalent to 7? ^ has no mate, because 7?^^ 
is not defined. Similarly for the R\ equivalent to 7?j. Hence 
there is one row and one column without a dot. (Similar results 


hold for larger values of m. ) 
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FIG. 39. -The two-dimensional grid used in the computation of 
the estimate of the joint cumulative distribution function. This 
example is for m - 1 and N* ■ 10. Each of the N* - m (■ S) pairs 


(R n ,R n+m ) is indicated with a dot at the intersection of the grid 
lines for these values (but labeled in terms of the ordered ver- 
sion of the innovation, i?'). In this example, the original sequence 
was The numbers are the counts of 

the dots above and to the left of the box in which the number api iars. 
The counts in each row are always 0 or 1 more than the counts in the 
row above: 0 for boxes to the left of the dot in the row, and 1 
for boxes to the right [cf. eq. (171)]. To get the function F 2 , the 
counts must be normalized by the final count, N* - m * 9. 


































Now (Rp *Rq) is 1 f(N* - *•:) times the number of pairs (dots) 
above and to the left of the point (Rp t R^) [see eqs. (96) and (97)]. 
A running count of this number is kept for successive rows in the 
grid. Since there is only one (or no) new point per row, this row 
count increases by unity for all squares to the right of the new 
point in the row. This relation is expressed in the recursion 
formula (171). The figure shows an example with N * ■ 10. The num- 
ber in each box is the number of dots above and to the left of the 
uua. The entries in the last row and column of the grid are never 
utilized but are shown to indicate how the normalization works: 

,y ) for x > and y £ R^ is equal to the total number of 
dots (* N* - 1) divided by N* - 1. The individual cumulative dis- 
tribution is trivial in the system of ordered R* s: 


w -£ • 


(A2) 
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PROGRAM MAIN 

COMMON/F2DVEC/XR(1999),XI (1909), RR(1909),RI (1999) 
C0MM0N/F2DSCA/PACN,FACR,FAC1 

COMMON/F2DI NT/LDAT,NUMR,NR,MPT, LAC,N1D,N2D,MAXLAG 
COMMON/ I NOV/R( 1900) 

DIMENSION AOLD(20),A1(29),A2(29) 

DIMENSION DATA(IOOO) 

DO 1 1-1,1900 
XR(I )-9.0 
XKD-9.9 
RR( I )-0.0 

1 R I ( I )-9.0 
READ(8,50)MAXLAG,LDAT 
READ(8,51)(DATA( I ), I -1,LDAT) 

59 FORMAK3I3) 

51 FORMAK6E12.5) 

DO 2 I -1, LDAT 

2 XR(I)-DATA(I) 

CALL FFT(XR, XI, LDAT, LDAT, LDAT, -1) 

READ(8,59)LAC,MPT,NUMIT 

READ( 8,51 ) (AOLD( I ), I -1, LAC) 

CALL F2DC(A0LD,RES) 
i FiNUMIT . CO. 0) STOP 
DO 29 IT-1, NUMI T 
DO 19 1-1, LAC 
A2( I +l)-AOLD( I ) 

19 AKI)-AOLD(I) 

AKLAC+D-0.9 

A2(l)-0.9 

LAC-LAC+1 

CALL F2DC(A1,RES1) 

MPT-MPT+1 

CALL F2DC(A?,RES2) 

I F(RES1. LT. RES 2) GO TO 12 
DO 11 1-1, LAC 

11 AOLDC I ) -A2( I ) 

GO TO 20 

12 DO 13 1-1, LAC 

13 AOLDd )-Al(l) 

MPT-MPT-1 

29 CONTINUE 

STOP 
END 
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SUBROUTINE F2DC(A,RES) 

COMMON/F2nVEC/XP(1999),XI(l!n')),RR(l'>0')),RI ( 1000 ) 
COMMQN/F2DSCA/FACN,FACR, FAC1 

COMMON/F2DINT/LDAT, MUMP, NP, MPT, LAC,N1D,N2D,MAXLAG 
DIMENSION A( 20 ) 

CALL F2D(A,RES) 

RESOLD-RES 
DO 1 1-1,3 
CALL F2D( A, RES) 

Dl FRES»( RESOLD-RES) /RESOLD 
RESOLD-RES 

IF(DI FRES. LT.1.0E-4)00 TO 3 

CONTINUE 

PRINT 2 

FORMATU5H DID NOT SETTLE) 

CALCULATE, NORMALIZE, AND SHIFT PULSE 
DO 4 I -1, I, DAT 
RR< I >-0.0 
RKI )«9.9 
DO 5 1-1, LAC 
RR( I ) -A( I ) 

CALL FFT(RR,RI,LDAT,LDAT,LDAT,-1) 

DO 6 I »> ! . LDAT 
TEM-RRO)*2*PlO)**2 
PRO )-RR(l J/TEM 
RKI > — RKI >/TEM 

CALL FFT(RR, Rl , LDAT, LDAT, LDAT, +1) 

I MAX- 9 
CMAX-9.9 
DO 7 1-1, LDAT 
TEST*ABS(RR( I ) ) 

IF(TEST.OT.CMAX)IMAX-l 
IF(TEST.GT.CMAX)CMAX-TEST 
CMAX-RR( IMAX) 

DO 8 1-1, LDAT 
I NDEX-I -l+IMAX-LDAT/2 
I F( I NDEX. LT. 1) I NDEX-LDAT+ I NDEX 
I F( I NDEX. OT. LDAT) I NDEX-I NDEX-LD AT 
RKI)-RR(INDFX). *AX 
CALL PLOT(RI , LDAT) 

FPE-RES*FLOAT( LDAT* LAC )/FLOAT(MAXLAG*( LDAT-LAC) ) 

RETURN 

END 
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SUBROUTINE F2D(A,RES) 

COMMON/ F2DVEC/XR( 1909), XI ( 1000), RP( 1099 ),P I ( 1909 ) 
COMMON/F2PSCA/FACN, FACR, FAC1 

COMMON/F2DI NT/LDAT,NUMR,NR,MPT, LAC, N1D,N2I ,MAXLAG 
DIMENSION P(21,20),Y(21),X(21),A(20) 

DATA SCALF, I PR/1.0,5/ 

PRINT 50, (AC I ), 1*1, LAC) 

NACT-LAC-1 

NPOINT-NACT+1 

N1D-LAC+1 

N2P-LDAT 

NUMR-N2D-N1D+1 

NR-NUMR-1 

FACR-1.9/FLOAT(NUMR*NUMR) 

FAC1-1.0/FLOAT(NR) 

FACN-1.9/FLOAT(LDAT) 

PSUM-0.0 

J-9 

DO 1 1-1, LAC 

I F( I . EQ.MPT)GO TO 1 

J-J+l 

TEMP-A( I ) 

PSUM-PSUM+ABS( TEMP) 

P( 1, J)-TEMP 

1 CONTINUE 
FNUM-F LOAT ( NACT) 

TES-ABS(PSUM) 

IF(TES,LE.1.0E-3)PSUM-0.15 
QSC— SCALF-PSUM/FNUM 
TEMP-SQRT(FNUM+1)«1. 0 
DEN-FNUM-SQRT ( 2 . ) 

PN-(TEMP4FNUM)*QSC/DEN 
QN-TEMP*flSC/DEN 
DO 3 I -2, NPOI NT 
DO 2 J-l, NACT 

2 P(I,J)-P(1,J) + QN 

3 P(l,l-l)-P(l,l-l)+PN-ON 
DO 5 1-1, NPOI NT 

DO 4 J-l, NACT 

4 X(J)-P(I,J) 

5 Y(I)-FUNK(X) 

C** P IS NOW THE INITIAL SIMPLEX 
ITER-0 
• PR I NT- 1 PR 

CALL AHOEBA(P,Y,NPOINT, ITER, IPPIMT) 

J-0 

DO 10 1-1, NACT 
IF( I . EO .MPT) J-J+l 
J-J+l 

10 A( J)-P( I PRI NT, I ) 

A(MPT)-1. 0 
RES-Y(I PRINT) 

PRINT 50, (A( I ), I -1, LAC) 

50 FORMAT(5G14.6) 

RETURN 
END 
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SUBROUTI NE AMOEBA( P, Y, NPOI N, I TER, I PR I N> 

DIMENSION P<21,20>,Y<21>,PR(29),PRR(20>,FBAR<29>,PINV(29> 
EQUI VALENCE(PI NV,PRR),(YPRP, YPI NV> 

DATA ALPHA,BETA,GAMMA,TOL/l,9 # 9.5, 2. 0,1.9E-03/ 

DATA NSTOP/150/ 

NVAR-NPOIN-1 
519 CONTINUE 

I ILO-1 
IHI-1 
INHI-1 

DO 10 I ■l / NPOI N 
Yl -YC I ) 

IFtYI .GE.Y(ILO)) GO TO 1* 

ILO-I 

10 CONTINUE 

DO 11 I *1, NPOI N 
Yl -YC I > 

I F( Yl . LE. Y( I HI ) )GO TO 11 
IHI-I 

II T9NTINUE 

I F( I HI . EQ. 1) I NHI -2 
DO 12 I »1, HPOI N 
IF(I .EO.IHDGO TO 12 
Yl »Y( I ) 

I F( Yl .LE.YdNHI ) ) GO TO 12 
I NH I ■ I 

12 CONTINUE 

IF(MOD(IT!iR,IPRIN).NE.O) GO TO 209 
ERR»100 .*(Y( I HI )-Y( I LO))/Y( I LO) 

121 PRINT 205,Y(I LO),ERR 

205 FORMAT( 1P,G13.4,F6.3) 

206 DIF-Y(IHI)-YOLO) 

RAT»DI F/Y( I NHI ) 

IF(RAT. LE.TOL)GO TO 80 

I F ( I TER .GE. NSTOPJGO TO 84 
IF(IGO.NE.O) GO TO 80 
209 ITER-ITEP+1 

DO 21 l-l,NVAR 

21 PBAR( I )»9. 

DO 23 I *1# NPOI N 
I F( I .EQ.IHI) GO TO 23 
DO 22 U-1,NVAR 

22 PBAR(J)-PBAR(J)+P(I,J> 

23 CONTINUE 

DO 24 l-l,NVAR 

24 PBAR(I )»PBAR( I )/NVAR 
DO 25 U-1,NVAR 

25 PR(J)»(1.+ALPHA)*PBAR( J)-ALPHA*P( I H I ^ J > 

YPR-FUNK(PR) 

258 I F( YPR . LE,Y( I LO) ) GO TO 30 

I F(YPR ,GE,Y( I HI ) ) GO TO 40 
I F(YPR,GE.Y( I NHI ) ) GO TO 38 
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26 DO 27 J-1,NVAR 

27 P(IHI,J)-PR(J) 

YC I HI )»YPR 

60 TO 1 

30 DO 31 J«1,NVAR 

31 PRR ( J ) -GAMMA* PR ( J ) ♦ ( 1 . -GAMMA) * PB AR ( J ) 
YPRR-FUNK(PRR) 

YTEST-Y(ILO) 

IF(YPRR.GE.YTEST) GO TO 26 
319 DO 32 J-1,NVAR 

32 P(IH|,J)«PRR(J) 

YC I HI )«YPRR 

GO TO 1 

38 DO 39 J«1,NVAR 

39 P(IHI,J)»PRCJ) 

YC I HI )»YPR 

40 DO 41 J-l, NVAR 

41 PINV(J)-BETA*P(IHI,J)+( 1. -BETA) *PBAR( d ) 
YPINV-FUNK(PINV) 

IFCYPINV.GE.YCIHI )) GO TO 50 

DO 42 d«l,NVAR 

42 PCIHI,J)-PINV(J) 

Y( I HI )«YPI NV 

GO TO 1 

50 DO 55 l=l,NPOIN 

IF(I.EQ.ILO) GO TO 55 
DO 53 J»1,NVAP 
PR(d)-0.5*(P(l,d)+P(ILO,d)) 

53 P(l,d)»PR(d) 

Y(l )«FUNK(PR) 

55 CONTI MU E 

60 GO TO 1 

80 IPRIN-ILO 

RETURN 

84 PRINT 841 

841 FORMATP DID NOT CONVERGE') 

I PR I N* I LO 

RETURN 

END 
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FUNCTION FUNK(PAR) 

C** VERSION OF JUNE 4, 1979, . .CUMULATI VE DISTRIBUTION 
DIMENSION PAR(20), 1 1 ND(1000) 

DIMENSION ROW(1000),IRANK<1099),NP1<1099) 
COMMON/F2i)VEC/XR( 1900), XI < 1999), RR( 1909), Rl( 1999) 
COMMON/F2DSCA/FACN,FACR, FAC1 

COMMON/ F2D I NT/ LDAT, NUMR, NR, MPT, LAC, N1D, N2D,MXLAO 
COMMON/ I NOV/R( 1999) 

DO 2 1-1, LDAT 
R( I ) -0.0 
RR( I )-0.9 

2 RKD-9.9 

C** PUT FOURIER TRANSFORM OF A INTO (RR,RI ) 

JJ-9 

DO 20 1-1, LAC 

I F( I . EQ.MPT)GO TO 29 

JJ-JJ+1 

RR( I )«PAR( JJ) 

20 CONTINUE 

RR(MPT) -1 . 9 

CALL FFT(RR,RI , LDAT, LDAT, LDAT,-1) 

C** DERIVE INNOVATION (-A*X) WITH FOURIER TRANSFORMS 
DO 3 1-1, LDAT 

QR-XRC I )*RRC I )-XI < I )*RI C I ) 

Ql -XR( I )*RI ( I )+XI ( I )*RR( I ) 

RR(l)-OR 

3 RICD-OI 

CALL FFT(RR,RI, LDAT, LDAT, LDAT, 1) 

DO 4 1-1, LDAT 

4 RR( I )-RR( I )*FACN 

C** SHIFT, ORDER, AND DIFFERENCE INNOVATION 
DO 5 I -N1D, N2D 
INDX-l-MPT+1 
I F( I NDX . LE . 9 )GO TO 49 
R( I NDX)-RR( I ) 

49 CONTINUE 

INDX-l-NlD+1 
RR( I NDX) -RR( I ) 

5 CONTINUE 

DO 51 1-1, NUMR 

51 I I ND( I )-l 

CALL ORDER(RR, I I ND, I RANK, NUMR) 

DO 52 1-1, NUMR 
I NDY-I I ND( I ) 

Rl ( I )-RR( INDY) 

IRANK( I NDY)-I 

52 CONTI K 

C** OLD-IINDv ;W) 

C** NEW- 1 RANK sOLD) 

DO 54 J-l, NUMR 
RR(J)-RI( J+D-RKJ) 

54 CONTINUE 


1 
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i 


! 
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C** 

C** INTEGRATE DR( I )DR( I +1) (F2(R( I ),R( 1*1) )-Fl(R( I ) ) F 1 < R ( I +1) ))**2 
G*« “ROW” IS ROW OF THE MATRIX REPRESENTING THE CUMULATIVE 
C** DISTRIBUTION FUNCTION OF (R< I >,R( l+D) 

C** 

FUNK-0.0 

DO 80 l AG- 1, Ki AX LAG 
FAC1»1.0/FLOAT(NUMR-LAG) 

DO 58 l-l,LDAT 
ROW(I)-0.0 

58 NPKD-NUMR 

DO 59 J-1,NUMR 
I NOY-t I ND( J)+LAG 
IF(INDY.GT.NUMR)GO TO 59 
NPl(J)-IRANKONDY) 

59 CONTINUE 
FSUM-0.0 

DO 64 J-1,NR 
DR-RR(J) 

I JUMP-NPl(J) 

FAC2-FLOAK J)*FACR 
DO 60 1-1, NR 
I F ( I ,GE, I JUMP) GO TO 61 

FSUM-FSUM+DR*RR( I )*(ROW(| )-FAC2*FLOAT< I >)**2 

60 CONTINUE 
GO TO 64 

61 CONTINUE 

DO 62 K-I,NR 
ROW(K)-ROW(K)*FACl 

FSUM-FSUM*DR*RR(K)*(ROW(K)-FAC2*FLOAT(K))**2 

62 CONTINUE 

64 CONTINUE 

FUNK-FUNK+FSUM 
80 CONTINUE 

RETURN 
END 
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SUBROUTINE ORDER( D, 1 1 , JJ,N) 
DIMENSION ll(N),JJ(N),D(N> 
K-l 

?F(K.GE.N) RETURN 

CALL SORT(D, 1 1 , JJ,K,KK,N) 

K-KK 

IF(K.GE.N) GO TO 15 
KK-K+K 

CALL SORT(D,JJ,l l,K,KK,N) 
K-KK 

GO TO 10 
DO 16 l-l,N 
IKI)-JJ(I) 

RETURN 

END 


SUBROUTI NE SORT(D, I I , JJ, K, KK, N) 

DIMENSION 1 1 (K,l), JJ(KK,1) 

M-N/KK 

IF(M.LE.O) GO TO 25 
DO 20 J-1,M 
I -J+J 

CALL MERGE (D, 11(1, 1-1), K, 11(1, l),K,JJ(l,J)) 

LEFT-N-KK*M 

IF(LEFT.LE.O) RETURN 

Ml-M+1 

MM1-M+M1 

I F ( LEFT . LE. K) GO TO 30 

LEFT-LEFT-K 

MM2-M1+M1 

CALL MERGE(D, 1 1 ( 1,MM1 ) , K, M ( 1, MM2 ) , LEFT, ,JJ( 1, Ml ) ) 
RETURN 

CALL MOVEO I ( 1,MM1) , JJ( 1,M1 ) , LEFT) 

RETURN 

END 


SUBROUTINE MOVE(X,Y,N) 

INTEGER X, Y 
DIMENSION X(1),Y( 1) 

NA-IABS(N) 

I F(NA, LE. 9, OR . MA.GT, 10090) RETURN 
I F( N) 10,39,29 
DO 15 I -1,NA 
Y( I ) — X( I ) 

RETURN 

DO 25 1-1, NA 
Y( I ) -X( I ) 

RETURN 

END 
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SUBROUTINE MERGE(D,X,N,Y,M,Z) 
INTEGER X,Y,Z 

DIMENSION X(N),Y(M),Z(1),D(1) 
NM-N+M 
J-l 
1-1 
JGO-1 

IF(M.EQ.O) JGO-3 
IF(N.EQ.O) JGO-2 
DO 30 K-1,NM 
JX-X(J) 

IY-Y(I) 

GO TO (10* 25,29), JGO 
10 IF(D(JX).GT.D(IY>) GO TO 15 

ZOO-JX 

IF(J.EQ.N). GO TO 17 
J-J+l 
GO TO 30 
15 Z(K)-IY 

IF( I ,EQ.M) GO TO 19 
I -I ♦! 

GO TO 30 
17 JGO-2 

GO TO 30 

19 JGO-3 

GO TO 30 

20 Z(K)-JX 
J-J+l 

GO TO 30 
25 ZOO-I Y 

I -I ♦! 

30 CONTINUE 

RETURN 
END 
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Absolute value (Lj), 30, 48, 68, 102-109 , 114-117, 121, 128, 137-139 
Acausal, 22, 26, 29, 31-33, 50-51, 55-57, 58, 63-67 
Advance operator, 52 
All pass filter, 83 , 84 

Autocorrelation (function) 19, 37, 42, 60-62, 68, 72-73, 79-80, 83, 85, 134 
Autoregressive (AR) (model, process, representation), 1, 31-38 , 40, 41, 43, 
73-75, 82, 86-87, 119, 135-159 
integrated moving average (ARIMA), 42-43 , 87, 119 
moving average (ARMA), 40-41 , 42, 75-76, 86-87 

Bins, 91-93, 122, 137-138 

Causal, 26, 28-29, 31, 33, 4t), 47-48, 50, 55-58, 63-67, 75, 78-79, 98, 112 
Central limit theorem, 23 

Characteristic function (see also joint characteristic function), 16 , 18, 
87, 90, 93, 121, 138 

Computation (numerical experiments), 135-176 , 178, 179-192 
Constant component , 164, 166, 172-176 
Convolution, 31, 52-54 , 62, 64, 82, 85, 170 

Cumulative distribution function (see also joint cumulative distribution 
function), 16, 18, 87, 90, 121-123, 138-139, 180-183, 189-190 

Decomposition, see Wold Decomposition 
Deconvolution, 74-75, 85, 96, 135 

tables, 138, 139, 142-143, 151-152, 155, 168 
Deterministic, 16-17 , 20-21, 29, 74-75, 77, 80, 96, 162 
Delay character (also phase character), 47, 58-62 , 79, 85-86, 109, 136 
operator, 52 

Dependence (dependently distributed, dependence measure), 19, 20, 86-95 , 
121-123, 136-139, 189 
Difference operator (y), 42 , 43 
Dipole (couplet), _55, 58, 82 

Discrete Fourier transform (DFT) , see Fourier transform 
Ergodic, 8, 88 

Estimates, statistical, 75-76, 89, 97-98 
Expected value, J14, 18, 77, 87, 121 
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Factorisation (into dipoles) , 55-58, 62, 64 
Filter (see also pulse shape), 24, 28, 31, 46-73 


continuous, 134-135 

Final prediction error (FPB), 102, 113-114, 127-128, 14Q, 143-144 , 159, 166 
Fourier transform, 1, 51, 63, 66-71, 119-121, 129, 166, 179, 184-185, 189 
Frequency domain, 37, 51, 58 

Gaps, see Sampling 

Gaussian noise, see Noise, Gaussian 

process (normal process), 8, 85-86, 136 

Identically and independently distributed (i . i . 4) , 19 , 23 
Identification (see also order determination), 4, 73, 112-114, 119, 177 
Impulse, 26, 36, 38, 46 

Independent (independently distributed) , JL4 (random variables) , 18^ (processes) 
20-21, 178 

Independently distributed innovations, 30, 78, 81, 85-96 
noise, see Noise, independently distributed 
Innovation, 30, 46, 76-77, 78-81, 85-87, 100, 119-121, 131-132, 145, 147, 

153, 157, 160, 163, 173, 181 

Inverse (convolutional), 49, 62-68 , 70-71, 75, 80, 82, 111*112, 116-117, 129, 176 
Joint characteristic function, 16, 18, 23, 87 

cumulative distribution function, 15, 23, 87, 90-91, 122-123, 189-190, 181-183 
probability distribution function, 14-15 , 23, 87, 91 

Lag (m* - maximum lag), 89-90, 92, 94, 124, 126, 138-156, 184 

Least-squares, 6, 68, 85, 96, 102 

Linear system, see Filter 

Local minimum, 92, 123-124, 126, 177 

Martingale difference property (MDP), 1!>, 81, 95-96, 137-138 
Maximum delay (or phase), 58-62 , 63-65, 79-80 
entropy method, 101-102 
Mean value, 19, 28, 43 
Memory, 28, 31, 32, 74, 97 

Minimization (optimization, deconvolution), 86, 88, 91, 97, 99-100 , 104-112 , 
114-117, 123-129 , 133, 135, 140, 142-143, 151-152, 155, 168, 179-180 
Minimum delay (or phase), 49, 58-62 , 63-66, 75, 78-80, 100, 102, 104, 166, 169-170 


205 


Mixed delay, 59, 78-80, 109, 169, 172 
Models, 6, 7, 9, 24, 30, 73-117 
Moment 24, 94-95, 121, 138 
generating function, 16 

Moving average (MA) (model, process, representation), 1, 24-31 , 32, 38, 40, 41, 
43, 46, 72, 73-87, 80, 85-87, 104, 120 
Negative amplitude, 159, 170-172, 174-175 
Noise, 23, 138-139, 156 

Gaussian, 23, 37, 85-86, 136, 139, 159-162 
independently distributed, 23, 37, 81, 86, 159 
uncorrelated, see Noise, white 
uniformly distributed (0), 11, 25 , 44, 135-158 
white, 23-24, 33, 41, 96, 136 
Nonstationary, 42-43 
Norm, 102, 104-106 

Normalization, pulse, 30-31, 33, 110-113, 185 

One-sided (pulses, filters, representations; see also causal), 48, 98, 102, 111 
Optimization, see Minimization 

Order (of a process), see also Final prediction error, 28-29, 32-33, 113-114, 
125-129, 149 

Ordering (according to magnitude), 122, 181-182, 191-192 
Origin of time, 26, 48, 49 (notation), 59, 67, 69 

Parsimony, 41^ 

Partial energy curve, pulse, 59 , 60-61 

Periodic signals (quasi-periodic signals), 33, 35-36, 37 

Phase character, pulse, see delay character 

Physically realizable, 48 

Poisson process, 44 

Prediction (predictive deconvolution, predictive decomposition), 3, 17, 22, 
75-77, 96-114, 133 

error (see also Innovation), (prediction error filter), 76-77, 98-102, 

103, 104-105, 109, 121, 134, 174 

Probability distribution (see also Joint probability distribution), _U, 18, 

87, 90, 121, 137-139 
Process, 3, 9, 10, 11-12, ^5 

Pulse shapes (see also Filter, Impulse), 30, 44, 46-73, 85, 129-131, 146, 148, 
154, 158, 161, 165, 169, 174, 185 
, exponential, 36, 39, 48, 70, 81-83, 136, 177 
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Pulse rate, 44-45, 100 

amplitude (see also Innovation), 28, 44-45, 46, 85, 100, 175 
amplitude distribution, 175 
Purely random, 17, 19 

Quasar, 3C 273, 164, 166-172, 174-176 

Random process (stochastic process), 3, 6-7, 9-23, 12., 17, 20-21 
Realization (relaization of a specific process), .10, 11-13, 17, 
37, 140-141, 150, 156, 163, 167 
Restart, 123, 124, 180, 185 
Reverse, time, 60 , 62, 80, 82, 101 


, 22, 80 
21, 25, 35, 


Sampling, 4, 8, 44, 47, U8, 132-135 
Sequential analysis, 10 

Shot noise (model, process), 7, 26, 43-46, 136 
Simplex, 123-125, 180, 187-188 
Sinusoidal signal, 162, 165 

Skewness, time skewness function, see Time skewness 
Skew-norm, 104 

Spectrum, 37, 42, 60, 72-73, 75, 79, 101-102, 135, 178 

Stability, filter (convergence), 29, 33, 38, 48, 63, 75, 78, 106, 162 

Stationary, 1, 19, 20-22 , 42, 74, 79-81, 89 

Stochastic process, see Random process 

Summation operator (S), 42 


Time domain, 3, 9, 52, 58 

series (see also Realization), 4, 5, 9-10, 167 
skewness (time skewness function), 95, 101, 105-109 
Trend (detrending), 11, 43, 75, 178 

Two sided filters (see also Acausal) , 22, 29, 32, 33, 38, 40, 41, 48, 50, 
55, 66, 79, 109-114, 136 


Uncorrelated (see also Noise, white), JU., ^9, 20-21, 74, 78-81, 85 
Uneven sampling, see Sampling 

Uniformly distributed noise, see Noise, uniformly distributed 
Unstable, see Stability 

Varimax norm, 121 

Wavelet, 46, 7J| 
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White noise, see Noise, white 

Wold decomposition (Wold theorem and extension), 22, 74-77 , 73-85 
Wraparound, 67, 69 

Yule-Walker equations, 99-100 

Zero (o£ Z transform), 57, 64-66, 68 
Z transform, 40, 49-52, 54, 55-57, 62-68, 72, 82 
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